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DIALOG (R) File 350:Derwent WPIX 
(c) Thomson Derwent. All rts . reserv. 

009949155 

WPI Acc No: 1994-216868/199426 

XRPX Acc No: N94-171354 

Method for assessing of task processing style of individual - defining of 
simulated situation for individual with scenario data, which are 
presented to him on computer controlled display device 

Patent Assignee: INTROSPECT TECHNOLOGIES INC (INTR-N) ; MARIHUGH S (MARI-I) 

Inventor: MARIHUGH S; OSTBY D L; OSTBY P S 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 5326270 A 19940705 US 91751548 A 19910829 199426 B 

Priority Applications (No Type Date) : US 91751548 A 19910829 

Patent Details:. ~ 
Patent No Kind Lan Pg Main IPC Filing Notes 
US 5326270 A 42 G09B-007/00 

Abstract (Basic) : US 5326270 A 

The method involves presenting a simulated situation and recording 
the individual's responses while resolving the situation. A subject 
undergoing the assessment is asked to assume the responsibilities of an 
Assistant Superintendent of Parks, replacing an individual who has 
unexpectedly left that position. The subject is first trained in the 
use of a touch-sensitive screen display for accessing data that may be 
useful in fulfilling the responsibilities of the simulated position and 
for providing input data used in the exercise. 

Each action by the subject undergoing the assessment is recorded in 
a raw data stream, along with the time that it occurred, and is 
statistically analyzed with respect to several parameters that 
define the subject's task-processing style. These parameters are useful 
in determining whether an individual is suitable for a job and for 
other assessment purposes, or can be used for training a subject to 
improve the subject's ability and efficiency in dealing with tasks. 

USE/ADVANTAGE - For psychological testing of individual's response 
and behaviour when presented with problem. Provision for training 
parson, providing solution how person solves problems. 
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Derwent Class: P85; S05; T01; W04 
International Patent Class (Main) : G09B-007/00 
File Segment: EPI; EngPI 

Manual Codes (EPI/S-X) : S05-D01C5A; T01-J03; T01-J12A; W04-W07 



1 Results*from patent-citation searching based on US 6,714,952 
9A9/4 (Item 4 from file: 350) 

DIALOG (R) File 350:Derwent WPIX 
(c) Thomson Derwent. All rts. reserv. 
014433282 

WPI Acc No: 2002-253985/200230 

Related WPI Acc No: 1999-120266; 2004-060849 

XRPX Acc No: N02-196090 

Application software package installation for local area network, 
involves generating application installation package based on differences 
between pre-installation and post-installation system snap-shots of 
software 

Patent Assignee: INTEL CORP (ITLC ) 
Inventor: LUU L 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Applicat No Kind Date Week 
US 93130097 A 19930930 200230 B 

US 96591222 A 19960118 

US 97859277 A 19970519 

US 98127116 A 19980729 



10/700,178 



Patent No 
US 6324690 



Kind Date 
Bl 20011127 



Priority Applications (No Type Date) : US 93130097 A 19930930; US 96591222 A 

19960118; US 97859277 A 19970519; US 98127116 A 19980729 
Abstract (Basic) : US 6324690 Bl 

NOVELTY - An application installation package is generated based on 
differences between pre-installation and post-installation system 
snap-shots of software on source workstation (201) . The package is 
installed on the user workstation (202) based on application 
installation package and default personality file received from the 
source workstation. The default personality file describes default 
installation parameters for the software package. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following: 

(a) Computer system; 

(b) Recorded medium with application software package installation 
program 

USE - For local area network. 

ADVANTAGE - Allows a LAN administrator to install application 
software on user's workstation automatically at any time without user's 
intervention. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
the local area network administrator workstation and user workstation. 
Source workstation (201) 
User workstation (202) 
pp; 49 DwgNo 3/6 
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Electric circuit conductor inspection apparatus creates inspection 
attribute of cross section configuration of conductor by sensing 
reflectivity and luminescence at conductor location 

Patent Assignee: ORBOTECH LTD (ORBO-N) 

Inventor: MARKOV I; SAVAREIGO N 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 20020039182 Al 20020404 US 2000237803 P 20001004 200247 B 

US 2001939682 A 20010828 

Priority Applications (No Type Date): US 2000237803 P 20001004; US 
2001939682 A 20010828 

Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 

US 20020039182 Al 15 G01N-021/00 Provisional application US 2000237803 
Abstract (Basic) : US 20020039182 Al 

NOVELTY - The inspection apparatus senses the reflectivity and 
luminescence at the conductor location to determine the top width 
dimension and bottom width dimension respectively, to create an 
inspection attribute of cross section configuration of the conductor 
using an impedance analyzer. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following : 

(a) Electrical circuit inspection method; 

(b) Method for manufacturing electrical circuit 

USE - Used in the field of electric circuit inspection especially 
during PCB manufacture. 

ADVANTAGE - By determining the top and bottom width dimension, any 
defect in manufacturing process to fabricate an electrical circuit can 
be determined. Comparison of the respective widths of the bottom and 
top surfaces of conductor provides an indication of the slope of the 
side walls of the conductor. Statistical information about 
uniformity in the widths of conductors along top and bottom surfaces is 
used to indicate flaws in etching process. 

DESCRIPTION OF DRAWING (S) - The figure shows the functional block 
diagram of automated optical inspection system to inspect electrical 
circuits for defects. 
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Multi-tier client/server system comprises clients having browsers for 
processing documents and maintaining connectivity with relational 
database servers and for executing application logic 

Patent Assignee: SARKAR S S (SARK-I) 

Inventor: SARKAR S S 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 6418448 Bl 20020709 US 99455422 A 19991206 200268 B 

Priority Applications (No Type Date) : US 99455422 A 19991206 
Abstract (Basic) : US 6418448 Bl 

NOVELTY Several clients have browsers for processing documents in 
XML and RDF, for creating and maintaining thin client windows on demand 
for persistent connectivity through the internet. Relational database 
servers with application logic in form of object packages comprise user 
defined packages and method and operator interfaces. The object request 
broker services execute application logic in CORBA. 

DETAILED DESCRIPTION - The relational database servers comprise 
user defined packages for providing call specifications for set of 
interfaces to embed in SQL queries, for specifying operations over 
attribute values from multiple tables, for specifying interfaces where 
parameter type definition maps to another interface or to tables, 
and user defined packages where uniform resource identifiers are used 
to locate elements in schema objects. 

USE - Multi-tier client server system for navigating, querying and 
manipulating information using specifications in resource description 
frame work and supporting multiple object relational database resources 
over the web. 

ADVANTAGE - Triggers queries for transactions through thin client 
windows for persistent communication with remote databases. 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram 
illustrating a single SQL query made over the relational database 
schema along with legacy and existing central databases. 
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Derwent Class: T01 

International Patent Class (Main) : G06F-017/30 
File Segment: EPI 
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Information distribution apparatus for e-commerce, receives response 
message, transmits another request message to product information 
resource and product information is sent to computer for inspection by 
requester 

Patent Assignee: CALL C G (CALL-I) 

Inventor: CALL C G 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 6418441 Bl 20020709 US 9849426 A 19980327 200270 B 

Priority Applications (No Type Date) : US 99316597 A 19990521; US 9849426 A 

19980327; US 2000621662 A 20000724 
Abstract (Basic) : US 6418441 Bl 

NOVELTY - A web browser program specifies barcode of product by 
transmitting a request message containing parameter value to 
internet domain name system based on which system accesses the database 
and transmits a response message containing an internet address to the 
computer. The program receives response message, transmits another 
request message to system and product information is returned to 
computer for inspection. 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are included for the 
following : 

(1) Retail sales performing apparatus; and 

(2) Internet shoppers provision method. 

USE - For e-commerce related to food products, cosmetics, health 
care products, pharmaceuticals. 

ADVANTAGE - The company code portion of the universal product code 
is stored in cross-reference database, and the remaining product code 
is sent to manufacturer's server, thereby the size of the 
cross-referencing database is reduced and maintenance of database is 
simplified, efficiently and reliably. 

DESCRIPTION OF DRAWING (S) - The figure shows a diagram illustrating 
the inter-relationship of the principle data structure used to 
implement a product code translator. 

pp; 27 DwgNo 2/8 

Title Terms: INFORMATION; DISTRIBUTE; APPARATUS; RECEIVE; RESPOND; MESSAGE; 

TRANSMIT; REQUEST; MESSAGE; PRODUCT; INFORMATION; RESOURCE; PRODUCT; 

INFORMATION; SEND; COMPUTER; INSPECT 
Derwent Class: T01 

International Patent Class (Main) : G06F-017/60 
File Segment: EPI 

Manual Codes (EPI/S-X) : T01-N01A2 



' Results from patent-citation searching based on US 6,714,952 10/700,178 
74/9/3 (Item 3 from file: 350) 

DIALOG (R) File 350:Derwent WPIX 
(c) Thomson Derwent. All rts. reserv. 

010895940 

WPI Acc No: 1996-392891/199639 

Related WPI Acc No: 1996-341806 

XRPX Acc No: N96-331134 

Spectral null sequences detecting method in communication channel - 
involves mapping each spectral null sequence to unique path of a cyclic 
successive states and edges through trellis by selectively outputting 
splitting counterpart states 

Patent Assignee: INT BUSINESS MACHINES CORP (I BMC ) 

Inventor: FREDRICKSON L; KARABED R; SIEGEL P H; THAPAR H K 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 5548600 A 19960820 US 94289811 A 19940812 199639 B 

US 94316597 A 19940929 

Priority Applications (No Type Date) : US 94316597 A 19940929; US 94289811 A 

19940812 
Abstract (Basic) : US 5548600 A 

The method involves tracking the spectral content of a sequences of 
electrical signals with a Viterbi detector. The processing of the 
sequences by the Viterbi detector is governed according to an N stage 
trellis structure. Each spectral null sequence is mapped to a unique 
path of a cyclic successive states and edges through the trellis by 
selectively outputting splitting counterpart states. Pre-selected 
states and edges are pruned at pre-selected times module N in the 
trellis such that no pair of unique paths support the same spectral 
null sequence. A time-varying trellis structure are created for 
limiting the maximum length of dominant error events in the sequences. 

ADVANTAGE - Reduces complexity of method and device for generating 
and detecting matched spectral null (MSN) coded sequences. Generates 
and detect MSN sequences with constrains against quasi-catastrophic 
sequences without requiring substantial path memory to assure high 
probability of survivor path merging. 
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Duplicatable magnetic tape media creation method - involves copying 
required data in source media onto destination media without copying 
padding data 

Patent Assignee: EMC CORP (EMCE-N) 

Inventor: MUTALIK M G 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 5819297 A 19981006 US 95534433 A 19950927 199847 B 

Priority Applications (No Type Date) : US 95534433 A 19950927 
Abstract (Basic) : US 5819297 A 

The method involves creating a source media having a predetermined 
percentage of its capacity filled with padding data. The required data 
is stored in the source media. The required data in source media is 
then copied onto a destination media without copying padding data. 

ADVANTAGE - Creates duplicate tape media that do not have padding 
data reliably. Enables working with existing drive technology and 
input-output software drivers and label recorders. Enhances 
probability of fitting data onto tape media. 

Dwg. 1/5 

Title Terms: MAGNETIC; TAPE; MEDIUM; CREATION; METHOD; COPY; REQUIRE; DATA; 
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System-Reliability Cumulative-Binomial Program: This program finds the 
probability required to yield a given system reliability 

(NTIS Tech Note) 

National Aeronautics and Space Administration, Washington, DC. 

Corp. Source Codes: 011249000 

Oct 89 lp 

Languages : English 

Journal Announcement: GRAI9001 

FOR ADDITIONAL INFORMATION: Contact: COSMIC, 112 Barrow Hall, University 
of Georgia, Athens, GA 30602; (404) 542-3265. Refer to NPO-17556/TN . 
NTIS Prices: Not available NTIS 
Country of Publication: United States 

This citation summarizes a one-page announcement of technology available 
for utilization. The cumulative-binomial computer program, NEWTONP, is one 
of . a set of three programs that calculate cumulative binomial probability 
distributions for arbitrary inputs. The three programs, NEWTONP, CUMBIN 
(NPO-17555) , and CROSSER (NPO-17557) , can be used independently of one 
another. NEWTONP can be used by statisticians and users of statistical 
procedures, test planners, designers, and numerical analysts. The program 
has been used for calculations of reliability and availability. NEWTONP 
calculates the probability p required to yield a given system reliability V 
for a k-out-of-n system. It can also be used to determine the 
Clopper-Pearson confidence limits (either one-sided or two-sided) for 
the parameter p of a Bernoulli distribution. NEWTONP can also be used 
to determine Bayesian probability limits for a proportion (if the beta 
prior has positive integer parameters) , the percentiles of incomplete beta 
distributions with positive integer parameters, the percentiles of F 
distributions in which both degrees of freedom are even, and the median 
plotting positions in probability plotting. The NEWTONP 

program is written in C. It was developed on an IBM AT computer with a 
numeric coprocessor using Microsoft C 5.0. The format of the program 
is interactive. It has been implemented under DOS 3.2 and has a memory 
requirement of 26K. 

Descriptors: ^Software; ^Probability distribution functions; 
^Reliability 

Identifiers: ^Computer calculations; NTISNTND 

Section Headings: 72F (Mathematical Sciences--Statistical Analysis); 62B 
(Computers, Control, and Information Theory--Computer Software) 
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PORT MIRRORING IN CHANNEL DIRECTORS AND SWITCHES 

Patent Applicant/Assignee: 

INRANGE TECHNOLOGIES CORPORATION 
Patent Applicant/Inventor: 

WOODRING Sherrie L, 3987 Farrcroft Drive, Fairfax, VA 22030, US, US 
Patent and Priority Information (Country, Number, Date) : 

Patent: WO 2002101987 A2-A3 20021219 (WO 02101987) 

Priority Application: US 2001297439 20010613 
English Abstract 

A storage area network that includes a monitoring component, wherein the 
monitoring component is capable of characterizing data flowing into or 
out of at least one port associated with a fiber channel director or 
switch so as to enable an operator to ascertain some usable information 
regarding the characterized data and/or its impact on the network. In 
many embodiments, the monitoring component provides a visual or audible 
signal to the operator regarding a particular data component. The present 
invention is further directed to methods for monitoring a storage area 
network, in particular, at least one port associated therewith. 
Claim 

A probe system adapted for use in a channel director comprising: 

at least one probe being capable of being associated with at least one 

port associated with said channel director; 

a mechanism for copying all ingress and egress data to/from a fiber 
channel port to the said probe for analysis. 

2 A probe system as claimed in claim 1, wherein said channel director is 
a storage area network. 

3 A probe system as claimed in claim 2, wherein said storage area network 
includes a fibre channel architecture. 

4 A probe system as claimed in claim 2, wherein said mechanism comprises 

a mirroring capability to copy the data associated with said port to said probe. 

5 A probe system as claimed in claim 1, wherein said probe is a software device. 

6 A probe system as claimed in claim 1, wherein said probe is a hardware device. 

7 A probe system as claimed in claim 1, wherein said mechanism reflects 
an optical energy signal on the transmit side of the port, wherein said 
optical energy is transmitted to said probe. 

8 A probe system as claimed in claim 7, wherein approximately 10 percent 
of said optical energy signal is reflected. 

9 A probe system as claimed in claim 1, wherein said mechanism reflects 
an optical energy signal on the receive side of a port, wherein said 
optical energy is transmitted to said probe. 

10 A probe system as claimed in claim 9, wherein approximately 10 percent 
of said optical energy signal is reflected. 

11 A probe system as claimed in claim 1, wherein said mechanism is an 
external fibre channel patch panel that replicates data for a given fibre 
channel port to said port. 

12 A probe system as claimed in claim 1, wherein said mechanism 
accomplishes an internal replication of data within a switch to a probe. 

13 A probe system as claimed in claiml, wherein said mechanism 
accomplishes an internal replication of data within a director to said probe. 

14 A method for monitoring data ingress and egress in a storage area 
network comprising: providing at least one probe on at least one port 

associated with a device in said storage area network; mirroring a portion of a signal 
ingress and/or egress associated with said port using said probe to a monitoring location; 
obtaining information regarding data ingress and/or data egress obtained using said mirrored 
signal . 

15. A method as claimed in claim 14, 

further comprising generating statistics on the information provided by said probe. 

16. A method as claimed in claim 15, further comprising viewing said statistics. 
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METHOD AND APPARATUS FOR RESTRICTING ACCESS TO A DATABASE ACCORDING TO USER 

PERMISSIONS 
Patent Applicant/Assignee: 

HPL TECHNOLOGIES INC, Suite 400, 2033 Gateway Place, San Jose, CA 95110, 
Inventor (s) : 

GHUKASYAN Hovhannes, 155 Pacchetti Way, Mountain View, CA 94040, US, 
Patent and Priority Information (Country, Number, Date) : 

Patent: WO 200388084 Al 20031023 (WO 0388084) 

Priority Application: US 2002115196 20020402 

English Abstract 

A method and apparatus for restricted access to a database according to 
user permissions are described. A user permissions file (1007) residing 
on a server includes information of permissions related to database 
records, and which of those permissions are associated with individual 
users. A permissions manager (1006) also residing on the server manages., 
user queries (1002) either directly by generating restricted queries 
(1008) that reflect only authorized access to database records for the 
user generating the query, or indirectly by downloading a permissions 
filter or information for a restricted parameters screen to the 
user's client, so as to generate the restricted query (1008) on the 
client. In any case, a database management system (1001) residing on the 
server receives the restricted query (1008) and generates a result (1003) 
by accessing only authorized database records for the user, and 
communicates the result (1003) back to the user's cleint. 
Claim 

26. A method for restricting access to a database according to user 
permissions, comprising : 

receiving a user identification provided by a user; 

generating information for a restricted parameters screen from 

information associated with said user identification so as to generate a 

restricted query through user selection of available options limited by 

tables, columns and records accessible to said user in a database; 

and providing said information for said restricted parameters screen so 

as to be made available to said user as part of an interface between said 

user and a database management system. 

27. The method according to claim 26, wherein said information for said 
restricted parameters screen comprises parameters information 

provided to said user interface so that said user interface displays said 
available options limited by tables, columns and records accessible 
to said user. 

28. An apparatus for restricting access to a database according to user 
permissions, comprising a server computer including a database and a 
database management system, said server computer configured to: 
receive a user identification associated with a user from a client 
computer; generate information for a restricted parameters screen 

from information associated with said user identification so as to 
generate a restricted query through selection by a user of said client 
computer of available options limited by tables, columns and records 
accessible to said user in a database; and 

download said information for said restricted parameters screen to 
said client computer to be made available to said user as part of an 
interface between said user and said database management system. 
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Author Affiliation: Dept. of Phys . & Astron., Cardiff Univ., UK 
Journal: Semiconductor Science and Technology vol.16, no. 3 p. 140-3 
Publisher: IOP Publishing, 

Publication Date: March 2001 Country of Publication: UK 

CODEN: SSTEET ISSN: 0268-1242 
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Material Identity Number: J690-2001-003 
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Language: English Document Type: Journal Paper (JP) 
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Abstract: The spontaneous emission and optical gain spectra from an 
InGaAs quantum dot laser have been independently measured under the same 
operating conditions . 

Using these spectra a combined probability-distribution function 

describing the electron occupancy, in the conduction 
and valence bands has been experimentally 

determined. Comparison of this function with theoretical curves 
based on Fermi-Dirac statistics shows that for temperatures down to 100 K 
the carrier occupancy statistics are accurately described by thermal 
distributions. Measurements at 70 K show a breakdown of thermodynamic 
equilibrium indicated by non-thermal carrier distributions. (12 Refs) 
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Mathematical expressions were derived for the exceedance rates and. 
probability density functions of aircraft response 
variables using a turbulence model that consists of a low frequency 
component plus a variance modulated Gaussian turbulence component. The 
functional form of experimentally observed concave exceedance 
curves was predicted theoretically, the strength of the concave 
contribution being governed by the coefficient of variation of the time 
fluctuating variance of the turbulence. Differences in the functional 

forms of response exeedance curves and probability densities 
also were shown to depend primarily on this same coefficient of variation. 
Criteria were established for the validity of the local stationary 
assumption that is required in the derivations of the exceedance 
curves and probability density functions. These criteria 

are shown to depend on the relative time scale of the fluctuations in the 
variance, the fluctuations in the turbulence itself, and on the nominal 
duration of the relevant aircraft impulse response function. Metrics that 
can be generated from turbulence recordings for testing the validity 
of the local stationary assumption were developed. 

Descriptors: ^Aircraft performance; ^Atmospheric turbulence; *Gusts; 
Mathematical models; Atmospheric circulation; Earth atmosphere; 
Probability density functions 
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The Two-Parameter Beta Method, introduced in the previous study as a 
method of estimating the operating characteristics of a test item, 
has proved to be as efficient as the Normal Approximation Method, for a set 
of simulated data of 500 hypothetical examinees having a uniform latent 
trait distribution between -2.475 and 2.475. Both methods are 
characterized: (1) by the use of a relatively small number of subjects-like 
500 — in the whole procedure of estimation; (2) without assuming any prior 
mathematical model; and (3) by the use of the estimated joint distribution 
of the latent trait and its maximum likelihood estimate. In the 
Two-Parameter Beta Method, the method of moments is adopted to approximate 
the probability density function of the maximum 

likelihood estimate, using polynomials of degree 3 and 4. The first two 
conditional moments of the latent trait, given the maximum likelihood 
estimate, are derived from theory and computed for the data for each value 
of the maximum likelihood estimate. The conditional distribution of 
the latent trait, given the maximum likelihood estimate, is 
approximated by a Beta distribution using the method of moments, with two a 
priori set parameters and two estimated parameters from the 
conditional moments. 
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Robust OCR of degraded documents 

Natarajan, P; Bazzi, I; Zhidong Lu; Makhoul, J; Scwhartz, R 
GTE Corp., Cambridge, MA, USA 

Proceedings of the Fifth International Conference on Document Analysis and 
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ABSTRACT: 

This paper is concerned with techniques for performing robust OCR of 
degraded documents, such us faxed text, using a hidden Markov model (HMM) 
based OCR system. We present two strategies for dealing with degraded 
documents. The first strategy is to train the system on degraded documents 
that have been subjected to the same, or similar, degradation process as 
the documents to be recognized. The second, more sophisticated, strategy is 
to use adaptation to adjust the parameters of the trained model 
in order to improve recognition accuracy on a specific document. This 
adjustment of model parameters is typically posed as a 
constrained optimization problem wherein a certain prespecified 
objective function is to be optimized. We present a comparative study 
of two objective functions. The likelihood function and the posterior 
probability. A variation of the basic posterior probability 
method is also discussed. Using adaptation with a model trained on 
fax-degraded data we have reduced, by a factor of three, the character 
error rate on fax-degraded text images generated from the University of 
Washington English Image Database I. 
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RECOGNITION; LIKELIHOOD; EDUCATIONAL COURSES; TARGET FUNCTION ; 
PROBABILITY FUNCTION 
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Abstract: Fault tree quantification enables not only the 
probability of the top event to be calculated but in addition -its-, 
failure rate, expected number of occurrences and also importance measures 
which signify the contribution each basic event makes to system failure. 
Due to the large number of failure combinations (minimal cut 
sets ) it is not possible using conventional techniques to calculate 
these parameters exactly and approximations are required. Most of the 
approximations rely on the basic events having a small likelihood of 
occurrence. When this condition is not met it results in large 
inaccuracies. These problems can be overcome by employing the binary 
decision diagram (BDD) approach. This method converts the fault tree 
diagram to a format which encodes Shannon's decomposition and allows the 
exact failure probability to be determined in a very efficient 
calculation procedure. By making use of the system probability 
function to obtain the criticality function other top event 
parameters as well as component importance measures can be 
calculated. This paper describes how the BDD method can be employed in 
fault tree quantification. (10 Refs) 
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Computer Associates International Inc (081957) 
1 Computer Associates Plaza 
Islandia, NY 11749 United States 
TELEPHONE: (631) 342-6000 
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CONTACT: Sales Department 

2002-05-30 00:00:00 Computer Associates 1 Unicenter (R) Performance 
Management offers end-to-end performance management, incorporating 
predictive management capabilities for early problem detection and 
prevention. Using adaptive pattern recognition and neural network 
techniques, it predicts effects of changing system characteristics such as 
fluctuating workload, system activity, and memory utilization on system 
performance. U.S. patented Neugents (TM) technology learns normal operating 
behavior by monitoring the system running conventional workloads and 
analyzing historical performance data. Data modeling and pattern matching 
are used to build a Personality Profile uniquely tuned to the operating 
characteristics of the machine. This product can also be updated to 
incorporate changes in machine hardware, software, or usage. A single 
profile can be applied to a series of machines for enterprisewide 
deployment. Comparing current operating conditions with the Profile enables 
Neugents technology to identify unique circumstances and subtle 
abnormalities. The historical data can be reviewed to confirm the 
Personality Profile configuration will identify real error situations and 
deliver accurate problem prediction. In addition, Neugents technology can 
detect new behavioral patterns. If the system configuration is altered, 
these predictive agents can learn a new Personality Profile. Moreover, 
Unicenter Performance Management is highly customizable, so users can 
adjust the probability at which predicted errors are alerted. 

DESCRIPTORS: Capacity Planning; Computer Diagnostics; Data Center 

Operations; Load Balancing; Network Administration; Network Managements- 
Network Software; Neural Networks; Pattern Recognition; Performance 
Monitors; Software Agents; System Monitoring; 

HARDWARE: Apple Macintosh; HP; IBM 390; IBM Mainframe; IBM PC & 

Compatibles; Sun; UNIX 
OPERATING SYSTEM: HP-UX; Linux; MacOS; OS/390; Solaris; UNIX; Windows 

NT/2000 

PROGRAM LANGUAGES: Not Available 

TYPE OF PRODUCT: Mini; Micro; Workstation 

POTENTIAL USERS: Cross Industry, Data Centers 
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Statistix for Windows 7 from Analytical Software is a comprehensive, 
menu-driven statistical program that performs all of the basic and advanced 
.statistical procedures needed by most users. It offers descriptive 
statistics, t-tests, non-regression analysis, ANOVA, statistical process 
control (SPC) charts, contingency tables, time series, data management, 
association tests, and linear models. Statistix features regression, 
analysis of variance, probability functions, sample tests, and 
randomness and normality tests. Users can import data from spreadsheet, 
database, and text files. They can produce and export publication-quality 
graphs and charts. 

DESCRIPTORS: Regression Analysis; Research & Development; Statistics; Time 
Series 

HARDWARE: IBM PC & Compatibles 

OPERATING SYSTEM: Windows; Windows NT/2000; Windows XP 
PROGRAM LANGUAGES: Not Available 
TYPE OF PRODUCT: Micro 
POTENTIAL USERS: Researchers 
DATE OF RELEASE: 01/1985 

PRICE: $495; includes unlimited support; Internet trial available : 

NUMBER OF INSTALLATIONS: 30000 
DOCUMENTATION AVAILABLE: User manuals 

TRAINING AVAILABLE: Telephone support; technical support; e-mail support 
REVISION DATE: 20030708 
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NeuralWorks Predict 3.0, formerly NeuralSIM, from NeuralWare is a 
development environment for creating and deploying neural network-based 
•applications. NeuralSIM combines neural network technology with-- 
statistics, genetic algorithms, and fuzzy logic to determine the best 
solutions to problems. It has applications in modeling, forecasting, 
classification, industrial inspection, process control, stock market 
timing, and classification applications. NeuralWorks helps developers 
create effective models. It analyzes data, transforms data, and selects the 
options and training methods to optimize a model. Its two proprietary 
training algorithms can process clean or noisy data. More than 200 
parameters give developers control over modeling. NeuralWorks 
Predict ! s seamless interface to Microsoft (R) Excel means that users can 
collect model data in Excel spreadsheets. They can work with the familiar 
Excel interface much of the time. The three-level interface lets users 
choose the level that is suitable for them. Features of NeuralWorks Predict 
include wizards and online context-sensitive help; useful default settings; 
special features to improve deployment; genetic variable selection to solve 
difficult problems; diagnostics to verify models; interfaces to, and 
generation of code in, Visual Basic, Fortran, and C; and a run-time system. 
The major components are Data Transformation, Data Selection, Network 
Training, Variable Selection, and Code Generation. The Data Transformation 
component can analyze and transform fuzzy transforms, nonlinear transforms, 
and enumerated types. 

DESCRIPTORS: Artificial Intelligence; Expert Systems; Fuzzy Logic; Genetic 
Algorithms; Models; Neural Networks; Program Development 

HARDWARE: IBM PC & Compatibles; IBM RS/6000; Silicon Graphics; Sun; UNIX 

OPERATING SYSTEM: AIX; Excel; IRIX; Linux; Solaris; UNIX; Windows; Windows 
NT/2000; Windows XP 

PROGRAM LANGUAGES: C; C++; Fortran; Visual Basic 

TYPE OF PRODUCT: Mini; Micro; Workstation 

POTENTIAL USERS: Cross Industry, Simulation, Developers 

PRICE: $2,495 and up; depends on platform 

DOCUMENTATION AVAILABLE: Online documentation; tutorials 
TRAINING AVAILABLE: Technical support; telephone support 

OTHER REQUIREMENTS: 4MB RAM; 80386+ CPU; EGA+ graphics; 5MB disk space; 

Excel 7 or 97; 
REVISION DATE: 20030917 
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2021 E Hennepin Ave 

Minneapolis, MN 55413-2726 United States 
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CONTACT: Sales Department 

Stat-Ease offers Design-Ease (R) 6, which helps researchers set up and 
analyze two-level factorial, Taguchi orthogonal arrays, fractional 
factorial, Plackett-Burman, and other experimental designs. These designs 
can quickly identify critical variables and directions for 
improvement. They are particularly well-suited for the early stages of 
product or process optimization. Design-Ease provides information on 
the resolution and alias structure to guide users to a good design. It can 
add center points to the factorial designs. Experiments can run in blocks 
or completely randomly. It is easy to learn and fast. Experimental designs 
are selected from easy-to-understand menus. To aid users in their choice, 
the alias structure for each fractional factorial design is given. Designs 
can be run completely randomized or in blocks. Interaction plots help users 
interpret significant two-factor interactions. Design-Ease can tap into 
two-level designs that extend to 256 runs; avoid the risk of wrong answers 
due to confounding; copy statistical outputs to a word processor and 
show these to customers or managers; protect users from lurking 
variables; calculate response data using familiar spreadsheets, then 
paste it to Design-Ease; and find an optional design showing all the 
details about resolution and aliasing. Design-Ease can also test for 
curvature using factorial design center points, helping minimize 
total experimentation and production costs; use residual plot analysis 
validation; produce analysis or variance (ANOVA) ; and handle botched or 
missing data. 

DESCRIPTORS: CAE; Engineering; Graphics for Science & Engineering; 
Industrial Engineering; Research & Development; Science; 
Statistics 

HARDWARE: 80486; IBM PC & Compatibles; Pentium 
OPERATING SYSTEM: Windows; Windows NT/2000; Windows XP 
PROGRAM LANGUAGES: Not Available 
TYPE OF PRODUCT: Micro 

POTENTIAL USERS: Industrial Scientists, Industrial Engineers, Industrial 

Researchers 
DATE OF RELEASE: 06/1997 

PRICE: $395; net 30; 30-day money-back guarantee 
DOCUMENTATION AVAILABLE: User manuals; reference manuals 
TRAINING AVAILABLE: Training; technical support 
OTHER REQUIREMENTS: 16MB — Win 95, 32MB+ RAM required 
SERVICES AVAILABLE: Consulting; warranty 
REVISION DATE: 20030228 
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Assessment Systems Corp (616826) 
2233 University Ave #200 
St Paul, MN 55114-1629 United States 
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XCALIBRE from Assessment Systems uses marginal maximum-likelihood 
methods for estimating item parameters for both the two-parameter and 
. ^three-parameter item response theory (IRT) models. Because it is based upon 
marginal maximum-likelihood methods, it can estimate item 
parameters for datasets with fewer items and/or fewer examinees than 
conventional maximum- likelihood approaches. XCALIBRE implements 
Bayesian prior distributions on the individual item parameters, and 
these prior distributions can be updated during estimation. Researchers can 
fix selected item parameters to known (prespecif ied) values and 
automatically calibrate the remaining item parameters to fit 
that scale, thus making item pool linking/equating much simpler. XCALIBRE 
can differentiate between not answered and skipped items. 

DESCRIPTORS: Colleges & Universities; Social Science; Software Testing; 
Statistics; Survey Research 

HARDWARE: IBM PC & Compatibles 

OPERATING SYSTEM: Windows; Windows NT/2000; Windows XP 

PROGRAM LANGUAGES: Not Available 

TYPE OF PRODUCT: Micro 

POTENTIAL USERS: Researchers, Science 

PRICE: $399; $950 — complete test analysis package; Internet demo available 

DOCUMENTATION AVAILABLE: Online documentation 
REVISION DATE: 20030518 
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Scientific Software International Inc (465534) 
7383 N Lincoln Ave #100 
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MULTILOG 7 from Scientific Software International employs item-response 
theory to perform analyses and test scoring for multiple-category items. It 
• provides item parameter estimation and subject scoring under the Samejima 
logistic model for graded responses, the Bock multinomial logit model for 
multiple nominal categories, the Bock-Same j ima-Thissen model for multiple 
choice items with guessing, and Masters' partial-credit model. These models 
can be fit to a latent ability continuum by marginal maximum 
likelihood or to a manifest ability criterion by maximum likelihood. 
MULTILOG has the capacity to impose equality constraints on selected 
subsets of item parameters, making it possible to analyze models 
intermediate between conventional 1-, 2-, and 3-parameter logistic models. 
MULTILOG also permits (quasi-) continuous measured variables to be 
mixed with the multiple category responses. 

DESCRIPTORS: Colleges & Universities; Schools; Social Science; 
Statistics; Survey Research; Test Scoring 

HARDWARE: IBM PC & Compatibles 

OPERATING SYSTEM: Windows; Windows NT/2000 

PROGRAM LANGUAGES: Not Available 

TYPE OF PRODUCT: Micro 

POTENTIAL USERS: Researchers 

PRICE: Available upon request; educational discounts available 

DOCUMENTATION AVAILABLE: Online documentation 
TRAINING AVAILABLE: Technical support 
REVISION DATE: 20030228 
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HLM 5 from Scientific Software International supports hierarchical linear 
modeling (HLM) and nonlinear modeling. As users specify variables at 
each level, the software constructs relevant equations for each level in a 
graphics box. These are saved and can be easily modified for subsequent 
analysis. HLM can read data from a variety of statistical packages 
including SPSS, SAS, SYSTAT, and STATA. It produces residual files that can 
immediately be read into these packages. Thus, all of the familiar 
exploratory analysis methods, data transformations, and graphical 
capabilities of these packages are readily available. HLM allows estimation 
of Bernoulli and binomial models for binary data with a logit link function 
and Poisson models for count data with constant or variable exposure with 
the log link function. Estimation is available for two- and three-level 
models with and without over-dispersion. Users can analyze data at the 
person level or grouped by covariate set. HLM provides estimation of 
population-average models using generalized estimating equations with and 
without robust standard errors as described by Zeger, Liang, and Albert 
(1988). HLM combines EM and Fisher scoring algorithms to produce a high 
standard of speed and reliable convergence for both two-level and 
three-level programs. Full maximum likelihood for two- and 

three-level hierarchical linear models and full penalized quasi-likelihood 
estimates for hierarchical generalized linear models are accompanied by 
standard errors for variance-covariance components. Replicated analyses for 
multiply imputed datasets such as the National Assessment of Educational 
Progress, the National Adult Literacy Survey, and the International Adult 
Literacy Survey are available for the two-level model. Newer features 
include unrestricted models, multinomial regression for two-level data, 
latent variable analysis, and ordinary least squares vs. estimates 
comparisons . 

DESCRIPTORS: Colleges & Universities; Models; Regression Analysis; 
Research & Development; Science; Social Science; .Statistics 

HARDWARE: IBM PC & Compatibles 

OPERATING SYSTEM: Windows; Windows NT/2000 

PROGRAM LANGUAGES: Not Available 

TYPE OF PRODUCT: Micro 

POTENTIAL USERS: Researchers, Science 

PRICE: $430; upgrade pricing; user manual — $35; educational discounts 
available; student version--$0 

DOCUMENTATION AVAILABLE: User manuals 
TRAINING AVAILABLE: Technical support 
OTHER REQUIREMENTS : Win 9x+ required 
REVISION DATE: 20030228 
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Statistical Designs (539058 
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Houston, TX 77075 United States 
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MultiSimplex 2.0 rapidly optimizes technical systems and processes 
using the basic or modified sequential simplex optimization methods. 
It can simultaneously adjust 15 continuous control variables. Fifteen 
different responses can be specified and combined using functions 
determined by fuzzy theory. Maximum, minimum or target values 
can be specified for each response. Results and graphical summaries are 
available throughout the process using the graphing capabilities of Excel 
The program allows users to define experiment points at any time, thus 
taking into account existing experimental information. 

DESCRIPTORS: CAE; Chemistry; Fuzzy Logic; Goal Seeking; Industrial 
Engineering; Process Control; Research & Development; Science 

HARDWARE: 80486; IBM PC & Compatibles; Pentium 
OPERATING SYSTEM: Windows; Windows NT/2000; Windows XP 
PROGRAM LANGUAGES: Not Available 
TYPE OF PRODUCT: Micro 

POTENTIAL USERS: Science, Engineering, Research & Development 
DATE OF RELEASE: 01/1997 

PRICE: $1,299; $399 — academic; volume discounts available 
DOCUMENTATION AVAILABLE: User manuals; tutorials 

TRAINING AVAILABLE: Training; telephone support; technical support; e-ma 
support 

OTHER REQUIREMENTS: 8MB RAM; 80486+ CPU; Win 9x or NT 4.0 required 
REVISION DATE: 20010530 
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SmallWaters Corp (639206) 
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Amos 4.0 from SmallWaters allows easy structural equation modeling and 
confirmatory factor analysis. Features include fully interactive, graphical 
path modeling that delivers presentation quality path diagrams for reports 
and publications; analysis of mean structures displayed in the path 
diagram; multi-group analysis, including different models for different 
groups; and missing data analysis by full information maximum 
likelihood for efficient parameter estimates. Amos 4.0 users can set 
equality constraints by using the same name for two or more 
parameters or value constraints by entering a number. Estimation 
methods include ML, ULS, GLS, ADF, and scale-free LS; fit 
statistics include chi-square, AIC, FO, RMSEA, ECVI, and many others; 
bias estimates and empirical confidence estimates tap bootstrap simulation 
for any empirical data distribution. Amos also offers Bollen-Stine 
corrected bootstrap for model testing under nonnormality . With Amos, 
multiple models are analyzed simultaneously (determines which models are 
nested and calculates the test statistics between them). Amos ! s newer 
features include direct, indirect, and total effects; p-values for 
individual parameters; a drag-and-drop interface; and broad Excel 
integration. Amos supports all European languages and Japanese, and it 
supports files in popular office and statistical applications. 

DESCRIPTORS: Colleges & Universities; Foreign Language Packages; Models; 
Research & Development; Science; Statistics 

HARDWARE: IBM PC & Compatibles; Pentium 

OPERATING SYSTEM: Excel; Windows; Windows NT/2000 

PROGRAM LANGUAGES: Not Available 

TYPE OF PRODUCT: Micro 

POTENTIAL USERS: Researchers, Science 

PRICE: Available upon request; free student version available on Web; 

approximately 688 Euros; upgrade pricing 

DOCUMENTATION AVAILABLE: User manuals; online documentation 
OTHER REQUIREMENTS: 16MB RAM required; 64MB RAM recommended 
REVISION DATE: 20030320 
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PRODUCT NAME: Experimental Data Analyst (640182) 

Wolfram Research Inc (443352) 
100 Trade Center Dr 

Champaign, IL 61820-7237 United States 
TELEPHONE: (217) 398-0700 
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CONTACT: Sales Department 

Wolfram Research's Experimental Data Analyst provides a set of detailed 
programs and packages for the fitting, visualization, and error 
analysis of experimental data. Extensive error analysis capabilities handle 
errors in both coordinates of the data, obtain estimated errors in the 
fit parameters, and examine graphical information about the 
fit including residuals of the fit. Data can be fit to 
linear or arbitrary models. Users can fit data to lines or curves 
when one or more of the data points can be wild and the least-squares 
technique cannot be used. For advanced problems, researchers can customize 
the behavior of the fitting routines by selecting from numerous 
options. Or, for less complex cases, EDA users can simply rely on the 
defaults for quick, accurate solutions. A variety of data transformation 
techniques, such as data smoothing and noise elimination, are available, as 
well as routines that automatically propagate errors of precision. 
Experimental Data Analyst's graphics capabilities provide a rich 
environment for visualizing experimental data. An extension of 
Mathematica ' s ListPlot function visualizes errors in data coordinates with 
error bars. The distribution of data values can be viewed pictorially using 
histograms or box plots. Users can fully control the display based on the 
data, the number of bins, and the minimum and the maximum. 

DESCRIPTORS: Engineering; Research & Development; Science; 
Statistics 

HARDWARE: Alpha; Apple Macintosh; HP; HP 9000; IBM PC & Compatibles; IBM 

RS/6000; Silicon Graphics; Sun; UNIX 
OPERATING SYSTEM: AIX; DOS; HP-UX; IRIX; Linux; MacOS; NextStep; OS/2; 

Solaris; SunOS; UNIX; VMS; Windows; Windows NT/2000 
PROGRAM LANGUAGES: Not Available 
TYPE OF PRODUCT: Mini; Micro; Workstation 

POTENTIAL USERS: Engineers, Financial and Data Analysts, Physical 
Scientists 

PRICE: $495; educational discounts available 

TRAINING AVAILABLE: Technical support; support contracts 
REVISION DATE: 20030414 
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Linux, Engineous Software's iSight Six Sigma, and ANSYS 1 s Ansys 
DesignXplorer are products with probabilistic methodology abilities that 
can predict when objects will break. With probabilistic methodology, a 
statistical approach is taken to random variables and 
determining probabilities. In engineering, design variables 
such as geometries, temperature, and material properties are extended to 
show a more precise and in-depth view of the performance of a design and 
its reaction to an environment. The iSight Six Sigma engine is used mostly 
for automotive applications and uses finite element analysis (FEA) tools to 
optimize designs. MSC. Software has created easy to use engineering 
applications for probabilistic methods. Veros Software provides several 
probabilistic engines that can be used with FEA programs. Eric Fox, VP of 
technology for Veros, says probabilistic studies consider many load 
combination possibilities and assists in choosing those that are most 
typical. Results show the probability that the load will be exceeded 
and what events will most influence breakage. Probabilistic methods can 
both save money and improve the quality of designs. For purposes of 
prediction, probabilistic methodology uses a physics, behavioral, rule, 
process- based predictive model, and considers various uncertainties and 
the possibility of error. It also uses past performance data to improve 
accuracy, but does not require past performance data to build a predictive 
model . 

COMPANY NAME: Engineous Software Inc (628077); ANSYS Inc (060607); 

Vendor Independent (999999) 
SPECIAL FEATURE: Screen Layouts Tables 

DESCRIPTORS: Auto Manufacturing; CAE; Engineering; FEA (Finite Element 

Analysis); Linux; Maintenance Management; Quality Assurance 
REVISION DATE: 20031030 
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Red Hen Systems 1 MapCalc Learner, an excellent revolutionary raster 
software package, is available with single user or academic lab licenses. 
It offers superb documentation and many Berry-written case studies, 
examples, graphics, and workflows. With MapCalc Learner, users can perform 
all types of features on their wish lists. MapCalc Learner has a top-notch 
selection of standard raster operators and also includes advanced and 
robust operations. For instance, Clump identifies contiguous areas, while 
Configure permits many shape and structural analyses. Size computes areas 
for each analysis, and Span computes the minimum width of each 
contiguous area from edge to edge. Spread is a module that creates 
traditional buffers but also permits spreading over an elevation data 
surface or through a friction surface. Composite permits users to compute 
parameters from one map in the categories of another map. Analyze can 
compute many statistics for each cell over multiple maps. Other 
functions described include Crosstab, Intersect, Drain, Stream, and 
Radiate. Cluster can do a classification process, while Correlate creates a 
correlation matrix for multiple maps. Compare creates a table of 
statistical values that compare two maps, and Regress does a linear 
regression on values in each cell in multiple separate map; Regress then 
outputs a map with an estimated value for the dependent map's cell. 

PRICE: $22 

COMPANY NAME: Red Hen Systems Inc (666491) 
SPECIAL FEATURE: Screen Layouts 

DESCRIPTORS: Graphics Tools; Image Processing; Mapping 
REVISION DATE: 20011030 
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ABSTRACT: The ASYSTANT+ data acquisition and analysis software package 
from Macmillan Software is an add-on package for the company ! s ASYSTANT 
program, which converts IBM PCs and compatibles into sophisticated desktop 
calculators. The $895 ASYSTANT+ adds the ability to control data 
acquisition accessories, using a user interface similar to stack-oriented 
hand-held electronically-programmable calculators such as "those offered by 
Hewlett-Packard. While the package cannot offer the performance of 
dedicated instruments, it does provide such scientific and engineering data 
acquisition and analysis functions such as waveform generation and 
processing, three-dimensional graphics, data file operations, curve 
fitting, polynomials, statistics, differential equations, a 
notepad, and a DOS command menu along with its calculator operations. The 
system requires 640Kbytes of RAM, an 8087or 80287 math coprocessor, two 
diskette drives or one diskette drive and a hard disk drive, and a graphics 
board. 

TEXT: 

In the scientific laboratory, data acquisition and analysis programs 
are playing an increasingly important part in the manipulation of 
experimental data. Similarly, such programs can be used in a variety of 
industrial applications of control simply processes. 

Macmillan Software's ASYSTANT+ converts the IBM PC and compatibles 
into a desktop data acquisition and analysis system comprising several 
virtual instruments. For many applications that can tolerate moderate 
sampling rates, ASYSTANT+ can take the place of more expensive, dedicated 
instruments — albeit at a loss in ultimate performance. 

The basic version of the program, ASYSTANT, converts the PC into a 
sophisticated calculator. To that basic capability, the more advanced 
version, ASYSTANT+, adds the ability to control a data acquisition 
accessory. ASYSTANT+ 1 S capabilities are similar to those of a sister 
product, ASYST, which provides a FOURTH interpreter-like user interface. 

A SOPHISTICATED CALCULATOR 

ASYSTANT-f 1 S basic user interface is similar to that of a 
stack-oriented, hand-held, electronic programmable calculator, such as the 
various Hewlett-Packard (HP) models. In fact, the main screen display is 
referred to as the desktop calculator and resembles a calculator in 
functionality. It is divided into five windows, four of which 
correspond to the facilities of an advanced programmable calculator (see 
photo 1) . The fifth window contains the main options that access other 
parts of the program, such as waveform processing and generating, graphics, 
and curve fitting. 

The calculator windows are stack contents, calculator 
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. functions, parameters, and variables. Three other calculator 
% "menus — array operations, conversions and special functions, and wave matrix 
operations — can be interchanged with the calculator functions (see figure 
1) . Each calculator menu includes the selection, next, to display the next 
calculator menu. 

A key concept in learning to use ASYSTANT+ is that of the stack — an 
area of memory used for temporary data storage. Data can be placed on the 
stack from the keyboard or from other storage areas, and can be removed 
from the stack to be placed in other storage areas. Most operations and 
functions take their arguments from the stack and leave their results on 
the stack. HP calculator users and FOURTH programmers should be comfortable 
with the system. 

The program begins with a cursor positioned on the first selection of 
the main menu, acquire. Pressing PgUp moves the cursor to the calculator 
functions menu. This gives the expected assortment of mathematical 
functions and stack operators — store, stores the entry at the top of the 
stack in a parameter or variable; dup, duplicates the top entry in the 
stack; drop, drops the entry at the top of the stack; swap, switches the 
top two entries on the stack; and roll, places the bottom entry on the 
stack on the top and pushes the other entries down one .. A' status selection 
allows the user to select the format of numeric output: angular units for 
use with trigonometric functions, and data type—integer , double-precision 
integer, real, double-precision complex. 

Calculator commands can be entered by moving the cursor to the 
desired selection with the arrow keys and pressing Enter or by typing them 
at the keyboard. When a number, letter, or operator is typed, the main menu 
window clears and a command line area appears in its place, regardless of 
the location of the cursor. 

Commands can be entered in Reverse Polish Notation (RPN) used by HP 
and FORTH or in algebraic notation. The program expects RPN; an algebraic 
notation always mus t be preceded with the character. Commands can be 
entered in strings and then are terminated with the Enter key. Entering a 
valid number places a result on the stack. ASYSTANT+ 1 S stack is limited to 
five entries, which are displayed in the stack contents window. Stack 
entries can be integers, real numbers, complex numbers, or arrays of 
integers, real numbers, or complex numbers. 

In the calculator menu, macros ("user functions") can be assigned to 
the ten function keys. Each key can be assigned up to five lines of RPN or 
algebraic notation. Pressing a function key while in the Calculator 
executes re macro. The macro assigned to one key can include the name of 
another key, so that additional functions may be performed by a single 
macro . 

The parameters and variables windows on the main 
screen display provide two types of storage registers, nine of each, 
Parameters, A through I, store numbers; and variables, R 

through Z, store either numbers or arrays. Parameter and variable values 
can be copied to the stack, and stack entries can be copied or, moved to 
the parameter and variable registers. Parameters and variables 
are available in all parts of the program, and they can be assigned 
descriptive names. 

VECTORS AND MATRICES 

The array operations menu is displayed by selecting the next option 
from the calculator functions menu. It offers a set of commands to create 
and manipulate arrays. AS YSTANT+ provides for two types of arrays; 
one-dimensional arrays, or vectors, and two-dimensional arrays, or 
matrices, An array occupies one slot on the stack or one variable. Arrays 



. cannot be stored in parameters. 

The program uses two 64KB segments of RAM for storage of arrays. One 
segment contains the arrays assigned to the variables R through Z, 
and the other segment contains any unnamed arrays on the stack. A single 
array can occupy an entire 64KB segment. 

The array operations menu offers selections for the basic vector and 
matrix operations. Three commands, n:ramp, nm:ramp, and aedit, generate 
unnamed arrays and place them on the stack. N:ramp takes the top entry on 
the stack as the size of a one-dimensional array (vector) and replaces it 
with a vector in which the i.sup.th element contains the value i.sup.th the 
value of each element is equal to the index. Nm:ramp takes the top two 
entries as the number of rows and columns of a two-dimensional array and 
replaces them with an array in which the ij.sup.th element contains the 
value (i-l)m+j-i is the row index and j is the column index. 

The commands xsect, sub, trans, drag, and reverse access certain 
array elements. Xsect takes the top element of the stack and replaces it 
with an element, sub with a subarray, trans with the transpose of the 
array, diag with the main diagonal, and reverse replaces the top element of 
the stack with an array with reversed column indices. 

Arrays can be reordered with the commands n:rot, reshape, sort, and 
lookup, and they can be indexed with index and n: search. Two commands 
combine two arrays to form a third; cat stacks the two arrays one over the 
other, and lam places them side by side. Cumulative operations can be 
performed on the rows of an array to calculate sums and products and find 
cumulative maxima and minima. 

Arrays can be examined in spreadsheet forms with the array editor 
function, aedit. Arrays can be created directly with aedit or with another 
command, n:ramp, nm:ramp, lam, or cat, for example, and then edited. They 
also can be built and edited by using other menu options and functions, but 
using the array editor is the easiest way to make minor changes. 

Switching to the third calcubtor menu, conversions and special 
functions, provides an assortment of options for converting 
data from one coordinate system to another, or from one data type to 
another, as well as special advanced functions. Numbers can be 
converted from a pair of values on the stack, one real and one 
imaginary, into a single complex entry on the stack A single complex entry 
can then be split into a pair of values. 

Data sets representing coordinates can be converted between Cartesian 
coordinates and polar coordinates or spherical coordinates. Also, complex 
numbers in the form x + jy can be converted to the polar form. (Note that 
PC Tech Journal is using the electrical engineering notation j for the 
imaginary part of the complex number rather than the mathematical form i.) 

The more advanced functions include the error function, factorials, 
the number of combinations of things taken r at a time from a set of n 
things, the number of permutations of things taken r at a time from a set 
of n things, the Bessel functions, elliptic integrals, the gamma function, 
and the incomplete beta function. 

The wave and matrix menu, the fourth calculator menu, offers several 
numerical techniques for the analysis of waveforms and matrices. Storing 
waveforms as arrays allows the use of many operations for the analysis of 
waveforms or matrices. A series of waveforms can be stored in a 
two-dimensional array, one waveform per row. 

Once the waveforms have been stored, two functions, smooth and 
window, are available to filter them. The smooth function, a low-pass 
filter, removes high-frequency components of a waveform in the time domain, 
to eliminate noise in a signal, for example. The window function simulates 



a Blackmail window, filtering out selected high and low frequencies. This 
function is better suited to waveforms stored in the frequency domain. 

A waveform can be intergrated by using Simpson's 1/3 rule or 
differentiated by using interpolating polynomials of a user-specified 
degree, as many as seven. Four functions are provided for Fourier 
transformations: fast Fourier transforms and inverse fast Fourier 
transforms for both one- and two-dimensional arrays. An additional function 
calculates the power spectrum (the square of tile magnitude of the Fourier 
transform) of an array. 

Other matrix operations included in the fourth calculator menu are 
the autocorrelation function, which is applied to the top entry on the 
stack; the aperiodic convolution of the top two entries; the application of 
a Blackman window to a subset of the top entry; the Hilbert transform of 
the top entry; and the cross correlation of the top two entries. By 
combining these advanced functions, the user can firer signals with 
low-pass or band-pass firers to remove noise or isolate signal components, 
process images, generate spectral analysis displays, generate diffraction 
patterns, and analyze signals in both the time and frequency domains. 

The program performs the basic statistical operations, average, 
standard deviation, maximum, and minimum. A single operator is 
provided to solve the matrix equation, y = Ax. The operator expects they 
vector as the top stack entry, and the A matrix (n by n) as the second 
entry. It replaces these two entries with the x, or solution, vector. 
Additional matrix functions are available, they include commands to return 
the trace of a matrix (the sum of the diagonal elements) , the matrix 
product of two arrays, the Kronecker product of two arrays, the determinant 
of a matrix, and the inverse of a matrix. 

CHOOSING FROM THE MENU 

The main menu of ASYSTANT+ provides 11 options that enhance the 
versatility of the program. These options include, graphics, a waveform 
generator and processor, two file operations, users functions, 
curve fitting, polynomials, statistics, differential 
equations, and a data acquisition menu. 

Graphics. ASYSTANT+ 1 s graphics commands allow data to be displayed on 
the screen, on a graphics printer, or on a pen plotter. Graphics boards, 
printers, and plotters are selected from menus at the beginning of the 
initial session, and the selection can be changed a the beginning of any 
session thereafter. 

Arrays are used to store graphics data. Two types of graphic displays 
can be generated, Cartesian plots and three-dimensional plots. Cartesian 
plots include line graphs of a single vector variable or a row of a 
rectangular array, plotted as a function of the indices; and line graphs of 
two vector variables or rows of rectangular arrays, with one variable 
or row taken as the independent variable and the other as the dependent 
variable . 

Three-dimensional representations include axonometric plots and 
contour plots of two-dimensional arrays (shown in photo 2) . An axonometric 
plot displays a surface representing the values of the plotted array 
superimposed over a rectangular grid; the height of the surface above the 
grid is proportional to the value of the array element. A contour plot 
displays a series of contour lines superimposed over a grid with the 
contour lines connecting elements of equal magnitude. 

The graphicsdisplay is available to preview graphics before plotting. 
The default screen display includes a graphics menu and a graphics window. 
The graphics window can be split into and right halves, upper and halves, 
and four quarters. 



ASYSTANT+ is able to produce a plot with a minimum of 
information, by using default values and scaling the axes to display all of 
the data in a single plot. The Setup command gives the user the ability to 
customize the plot by specifying minimum and maximum 

values, linear or logarithmic scales, labels, grids, and the location 

of the origin. Whenever an IBM Enhanced Graphics Adapter (EGA) is used, the 

axes, labels, background, and plot can be displayed in different colors. 

Users also can customize graphics windows with the addition of 
text labels. Labels can be positioned and aligned as desired. The contents 
of a graphics window can be saved to disk, and recalled at a later time for 
display. 

A graphics display is generated by selecting the type of plot--y 
Auto, y Plot, xy Auto, xy Plot, xy Axis, Axon, or contour. The program 
prompts for the variable to be plotted and then displays a menu that 
includes the selections display graph and to plotter; these selections 
produce screen displays and plots. 

Waveforms. ASYSTANT+ includes both a waveform generator and 
processor. The generator creates arrays of values that represent a variety 
of continuous waveforms typically available from analog function 
generators. These include sine waves, cosine waves, square waves, 
triangular waves, sawtooth waves, pulses, uniform noise, white noise, and 
Poisson pulse trains. In addition to selecting the type of waveform, the 
user can control the gain, bias, and frequency of the waveform. These 
created arrays can be displayed on the screen, stored on disk, plotted on 
the pen plotter, and used as the digital input to an digital-to-analog 
convertor in ASYSTANT+ . 

The waveform generator produces a single output-one of the waveforms 
listed above. However, the output can be stored in memory and then pushed 
onto the stack. Successive output waveforms can be pushed onto the stack, 
and then the calculator can be used to manipulate or combine them, creating 
waveforms of arbitrary complexity. 

While in the waveform generator, two waveforms are immediately 
available: the output of the generator and a waveform stored in memory. 
The output waveform can be added to the memory waveform to create complex 
waveforms without leaving the generator. Waveforms can be plotted on either 
the screen or the plotter. 

The waveform processor provides a graphic alternative to the 
calculator for processing one-dimensional arrays (waveforms) or specified 
rows of two-dimensional arrays. The waveform processor display includes a 
large window in which a waveform is displayed, a series of small 
windows that summarize the history of the wave processing session, 
and a menu of commands. 

The commands available in the waveform processor are a subset of 
those available in the calculator and file processor. However, intermediate 
results are displayed on the screen interactively, and several graphic 
aspects of display can be specified by the user. 

Waveforms can be processed in segments, allowing uninteresting 
portions of the waveform to be ignored, or separate segments to be 
processed in different ways. A current segment can be selected graphically, 
by positioning two cursors in the main graphic window. Segments of the 
waveform are stored in several repositories-WFM (waveform) , ORG (original 
segment), MEM (memory segment), PRV (previous segment), and SEG (current 
segment) . Images of the repositories are shown a the top of the screen for 
reference; contents of MEM and SEG can be combined with selections from the 
waveform processor's memory ops menu. 

Processing options include scaling the waveform with a fifth-degree 



polynomial , clipping SEG to a specified minimum and maximum, 
computing the derivative of the waveform (to a user-specified order) , 
computing the integral, smoothing the current segment, computing the power 
spectrum, and finding the envelope of the waveform. An analysis menu 
provides selections to find the basic statistics, rise time, fall 
time, area under the curve, and width of a specified peak. 

Data file operations. Two submenus from the main menu are devoted to 
file operations: file I/O, and file processor. File I/O provides the basic 
facilities for storing and retrieving data associated with variables 
and for converting data files into files that can be used by other 
programs. The program supports two external formats: DIF and ASCII, 

ASYSTANT+ data files are physically composed of a block of comments 
followed by a series of data subfiles. Logically, the file can consist of 
comments and data sets. 

Both subfiles and data sets contain multiple data points, and both 
are limited to 64KB, which corresponds to the area in RAM that ASYSTANT+ 
sets aside for the storage of variables. A data file can contain 
several blocks that may represent various aspects of a model or experiment. 

ASYSTANT+ 1 s file I/O menu allows subfiles and data sets to be 
selected as rectangular sections of a group of arrays .'-"Even though the data 
file is actually a linear sequence of values, data can be addressed by row 
and column number, just as if the data were arranged in two dimensions, 
Data sets can be selected by specifying values or by scrolling through the 
file graphically. 

The file processor menu integrates calculator functions and disk I/O 
functions. The processing capabilities of the desktop calculator and the 
file processor are identical. However, the file processor allows the user 
to specify the data source, the operations to be performed, and the 
destination for the results. The actual processing can be allowed to 
proceed unattended, whereas processing with the desktop calculator usually 
must be performed step by step. 

Curve fitting. The curve fitting of ASYSTANT+ gives an 
interactive environment for fining smooth curves through x-y data sets. 
Results are displayed as mathematical values and in graphic form. 

The fitted curve can be specified as linear, polynomial, 
logarithmic, exponential, multilinear, or user-defined. Multilinear 
fits operate on one rectangular array and one vector, and the 
remaining fits operate on two vectors. The goodness Of fit is 
determined by the least-squares fining method. 

Both the original data and the fitted curve are displayed, 
superimposed in a graphic window. The residual error curve is plotted in a 
separate window. 

Polynomials. An extensive set of polynomial operations can be 
performed from the polys menu. Polynomials can be added, subtracted, 
multiplied, divided, and shifted by a factor. Polynomial coefficients can 
be edited and copied to a variable. Roots can be extracted and saved in a 
variable, and polynomials can be integrated and differentiated. Finally, 
selections are provided to generate Legendre, Laguerre, Tchebyshev, and 
Hermite polynomials. 

ASYSTANT+ can handle 10 polynomials. Each polynomial can contain real 
or complex coefficients and can be up to the ninth degree. A polynomial is 
first defined, and then it can be applied to the top stack entry. 

Statistics. The stats selection of the main menu presents a 
submenu of statistical operations and messages. An edit function is 
available to allow the user to create or edit a data table without leaving 
the menu. The stats editor is identical to the array editor that is 



provided in the desk calculator. 

The basic stats option computes and displays the basic 
statistics for a variable or subset of a variable. The 
statistics displayed include the maximum value, the 
minimum value, the sum of the values, the mean, the median, the 
variance, the standard deviation, skewness, kurtosis, the sum of the 
squares, and the root mean square. These values are displayed in a window 
on the screen and can be sent to the printer. Other basic statistical 
functions such as sorting, percentile calculations, and hypothesis testing 
also can be performed from the menu. The hypothesis tests that are provided 
include the Kolmogorov-Smirnov normality test, the 1 sample to test, the 2 
sample to test, the 1 sample chi-square test, the 2 sample F test, the 
Wilcoxon signed-rank test, and the Mann-Whitney rank-sum test. 

Histograms can be generated and plotted. The user specifies the 
number of breakpoints between "bins". The program sets up the specified 
number of bins, equally spaced between the minimum and maximum 
data values. Once generated, the histogram can be plotted, saved to a disk 
file, or left in the calculator variables. 

A menu selection is available to generate commonly used frequency 
distributions. These include both percentages and percentiles of the normal 
distribution, the chi-squared distribution, the student to distribution, 
and the F(n,m) distribution. 

Two advanced analysis techniques are provided by ASYSTANT+, Stepwise 
regression is included with three variations of the analysis of variance 
(ANOVA) technique, one-way, two-way, and table. The ANOVA techniques 
indicate which of several independent variables are most significant 
in explaining the variations in the dependent variable. ASYSTANT+ displays 
the results of ANOVA in a table listing the sum of the squares, the degrees 
of freedom, the mean sum of the squares, the F-value, and the significance 
level of the F-value for each component and the residuals. 

The regression option allows the construction of a model representing 
a dependent variable as a linear function of several independent 
variables. A vector holds the dependent variable, and an array holds 
the independent variables. The technique is interactive. 

Terms can be entered into and removed from the model with a few 
keystrokes; this allows several combinations of terms to be examined 
easily. 

Differential equations. ASYSTANT+ provides a numerical method for 
solving first-order differential equations, ranging from a single equation 
to a system of five equations, using the fourth order Runge-Kutta method. 
Up to six variables are used, the X variable for the independent 
variable, and Y, Z, U, V, and W for dependent variables. 

The model to be examined is specified by entering the system of 
differential equations, the initial conditions, and extrapolation 
parameters, consisting of step size used to generate the solution 
curves and the final X-value. Solution curves are stored in variables 
that can be displayed on the screen under the graphics menu, saved to disk, 
or sent directly to the plotter. 

Notepad. ASYSTANT+ includes a simple screen editor, the notepad, 
which is available from both text and graphics screens by pressing Ctrl-N. 
The manual cautions that the notepad is not intended to take the place of a 
word processor; however, the editor is equal to the task of taking notes 
during experiments and creating simple reports. 

The notepad is limited to straight ASCII text files with no control 
characters (such as the ones inserted by most word processors) , 16KB total 
file size, and 80-character lines. Arrow keys and function keys are 



implemented, to provide cursor movement by character, word, line, word, and 
file. A limited set of block operations is available, as well as search and 
replace capability. 

Text can be inserted into the current notepad file when the editor 
itself is inactive, ASYSTANT+ stores re current file name, and a cursor 
location. The calculator functions menu includes a print command that sends 
the top stack entry to the screen, printer, or current notepad disk file. 
Disk file output can be inserted a the current cursor location or appended 
to the end of the file. Charts and tables can be constructed in the stack 
with the various matrix operators and functions, edited with the aedit 
command, and then inserted into the notepad file. 

Mini-calculator. A streamlined version of the desk calculator, the 
mini-calculator, is available from both text and graphics displays when any 
of the main menu options is active. Only the command line can be used for 
input; menu input is not available, and those commands that are only 
available as menu selections cannot be called from the mini-calculator. The 
display consists of the stack and a command line. 

DOS commands and help. A menu of basic DOS operations can be invoked 
by pressing Ctrl-D. Menu selections can delete, copy, and rename files, 
display directories, and return to ASYSTANT+ . An on-line help facility can 
be invoked by pressing the ? key. It is context sensitive and organized to 
follow the structure of the manual. The help display can be paged by 
pressing the Space Bar, or navigated with the function keys. 

ACQUIRING THE DATA 

In addition to the basic ASYSTANT facilities, ASYSTANT+ includes the 
software necessary to control data acquisition hardware, The host computer, 
under the control of ASYSTANT+, becomes the control panel and graphic 
display for several such devices. In each case, the computer display 
resembles a traditional analog instrument. 

Data acquisition functions are available from the data acquisition 
menu, which is displayed when the acquire option is selected from the main 
menu. This menu includes selections for the various instruments ASYSTANT+ 
can emulate and a selection for configuring the software to match the data 
acquisition board or external chassis. 

Configuration of the system is menu-driven. It consists of selecting 
the host computer and the data acquisition board from lists of supported 
devices and then setting various parameters to match the physical 
configuration of 'the data acquisition board. The manual astutely warns the 
user that determining the physical configuration of the hardware may not be 
a trivial matter. A detailed appendix provides information about the 
configuration of supported boards; it is presented clearly and concisely 
enough to replace most data acquisition board manuals for standard 
applications . 

It should be noted that configuration involves specifying the host 
computer as well as the data acquisition board, even though the program is 
in use on the host computer. The program must know the clock speed of the 
host computer to perform timing tasks. 

Data acquisition board parameters that are specified during the 
configuration process include the board's I/O address, the number of A/D 
channels, the A/D channel voltage range, the hardware gain, the number of 
D/A channels, and the D/A voltage range. ASYSTANT+ does not necessarily 
support all of the features and configurations of supported boards, but the 
manual documents the ones that are. 

Additional configuration parameters, selected from the 
acquisition configuration menu include confirmation that a hardware 
scroller board (a high-speed, strip-chart recorder) is installed, the 



specification of engineering units to be used in file conversion, color 
assignments for A/D channels when an EGA board is installed, the assignment 
of names to channels, and a bit pattern to be set on the digital output 
port at the beginning of a data acquisition session. A final option is the 
selection of an unprotected mode. ASYSTANT+ normally operates in a 
protected mode, in which it prevents acquisition of data a sampling rates 
above that known to be reliable (the Nyquist rate) . The unprotected mode 
allows the user to specify higher sampling rates a the risk of hanging the 
system, requiring a reboot. 

With the data acquisition board installed and configured, ASYSTANT+ 
provides the user with the ability to select the preferred interface, or 
metaphor, from the data acquisition menu. Each selection performs the same 
basic task, that of controlling the data acquisition board, but it 
resembles a different laboratory instrument (see figure 2) . 

ASYSTANT+ can simulate a strip-chart recorder, a hardware scroller 
(if one is installed), an XY recorder, a transient recorder, a data logger, 
a high-speed recorder; a signal generator, and a function generator. When 
an instrument is selected, the program displays a submenu including options 
to set or modify instrument parameters, to begin acquiring data, and 
to return to the data acquisition menu. Set-up parameters can be 
saved to disk and recalled. 

In general, acquisition parameters are common to all of the 
instruments; although some of them require the specification of additional 
parameters. ASYSTANT+ displays the current parameters on a 
configuration screen, along with appropriate limitations, and prompts the 
user for new values. The parameters required to set up a 
general-purpose instrument for a session are trigger type, internal or 
external clock, number of analog input channels, the first channel in a 
scan cycle, value for the software gain, the acquisition rate, the number 
of data points per channel, the number of scans to perform in the session, 
and the file to be used for data storage (file storage is optional) . 

Because data acquisition boards typically multiplex several analog 
input channels through a single analog to digital converter and have limits 
on the speed a which they can operate, these parameters are 
interrelated. For example, in the high-speed recorder mode, the 
maximum acquisition rate is inversely proportional to the number of 
channels selected. 

ASYSTANT+ extends the operation of its waveform generator to the 
control of the data acquisition hardware, allowing the system to operate as 
a function generator. The digital values determined by the function 
generator are used to produce analog signals with the data acquisition 
board's digital-to-analog converter. The function generator 
provides two output channels, taking arrays stored in variables R and 
S as the input waveforms. The function generator can create standard 
waveforms, experimental waveforms acquired from earlier sessions, and 
waveforms that have been processed by any ASYSTANT+ ASYSTANT+ 1 s function 
generator is capable of providing signals that are not available from 
conventional analog function generators. It is limited in speed and 
resolution to a throughput of 300 to 400 points per second. 

The function generator can be used as a stand-alone device or in 
conjunction with other ASYSTANT+ instruments. In either mode, the 
generator's output can be controlled interactively. As a stand-alone 
device, it can replace a conventional generator and drive a plotter or 
real-strip chart recorder to produce a hard copy of a waveform. When used 
in conjunction with the other instruments, the generator can provide a 
known stimulus or control signal to the experiment. Using the generator 



with other ASYSTANT+ devices can affect the operation of the generator or 
the 'other device, reducing the throughput of the acquisition instrument. 
The program, however, does allow the operator to set the priorities of 
concurrent tasks. 

ASYSTANT+'s strip-chart recorder is a digital replacement for an 
eight channel strip-chart recorder. The screen display resembles an analog 
strip-chart recorder with data points that appear at the right edge of the 
display and move across the screen as if on moving paper. The screen 
displays only the active channels, providing greater resolution as the 
number of channels is reduced from the maximum of eight.. 

The strip-chart recorder is limited to a maximum throughput of 
40 to 70 Hz (points per second in this context), the exact maximum 
rate depends upon the hardware configuration. If the maximum number 
of channels is selected, and data are output to disk concurrently, the 
throughput is reduced. Thus, the recorder is suited only to slowly varying 
signals. If data file output is not selected, the data are lost once they 
scroll off the screen. 

While it is operating, the strip-chart recorder can be controlled. 

The data acquisition rate and gain can be altered; data file output 
- can be suspended and resumed; and the display resolution can be modified by 
skipping data points. If the function generator is active, it may also be 
adjusted. 

The XY recorder acquires data from a maximum of two channels 
and displays the data on an xy plot-one channel's input corresponding to 
the x axis and the other corresponding to the y axis. It is possible to 
display vertical and horizontal grids either individually or together. 

The XY recorder has a higher throughput, ranging from 340 to 670 Hz, 
than does the strip-chart recorder. The difference in speed is due to the 
limit of two channels, and to a lack of concurrent data file output that is 
available only between scan cycles. The user can select a single scan mode 
in which the recorder pauses to allow data file output or a continuous scan 
in which data file output is not an option. 

The XY recorder can be interactively controlled. While the recorder 
is acquiring and plotting data, the user can set the acquisition rate and 
programmable gain, adjust the function generator (if it is enabled) , change 
the display increment and halt the scan. Between scans, data can be saved 
to disk if data file output was selected; then the next scan can be 
initiated, and the current scan can be displayed versus time, superimposed 
on the xy plot. 

To acquire data before and after an event in. an experiment, the 
transient recorder captures and plots analog data in two stages, based on 
two triggers. It can acquire dam on as many as eight channels with a 
maximum troughput of 340 to 800 Hz. The user must specify two 
triggers to begin acquisition of data for each stage. The recorder acquires 
and then plots the dam. As with the XY recorder, data can be output to a 
disk file only between scans. A continuous mode and active control during 
operation are available. 

The data logger is a low-speed device that provides for analog data 
input from up to four channels and the control of eight digital lines. Its 
throughput is limited to 1 Hz. However, concurrent data file output, 
realtime conversion of voltage to engineering units, and simultaneous 
hard-copy output are available. Data are displayed in text form on the 
screen in realtime. 

Setting the acquisition parameters for the data logger requires 
three screens instead of the usual one for selecting and configuring the 
analog input channels. Screens are provided to define from one to four 



stages and up to six alarm triggers. The stages allow the acquisition rate 
• and -control logic to be varied during the course of an experiment. The 
alarm triggers control the display of messages and output of userdefined 
bit patterns on the digital lines according to analog input levels or 
digital input bit patterns. 

The ability to place bit patterns on the digital port allows the data 
logger to be used as a controller, It can monitor and display up to four 
process variables measured with analog sensors, and it can monitor 
the states of as many as eight digital, two-position, devices. Based on 
these conditions, the data logger can provide an eight-bit digital output, 
which can be used to control eight digital devices or, if suitably 
converted, an analog device. It cannot directly control a proportional 
control device. 

The high-speed recorder provides the highest sampling rate of the 
ASYSTANT+ instruments, matched only by the signal averager. Depending upon 
the data acquisition hardware, the sampling rate may exceed 30 KHz. The 
sampling rate that is realized is affected by the number of channels 
specified, as well as by the add-on hardware limitations. 

This high-speed recorder performs its tasks sequentially, first 
acquiring the data, then plotting them on the screen, and finally recording 
them to disk. Users can disable the screen display to reduce the time 
between scans. Active control is provided, allowing the data plot to be 
examined in detail between each of the scans. 

The signal averager is similar to the high-speed recorder, offering 
the same sampling rate and number of channels and storing a cumulative 
average of multiple scans. It allows data file output only at the end of a 
session, at which point it stores the current cumlative average. The 
display is similar to that of the high-speed recorder, however, it shows 
the current scan and the cumulative average scan superimposed for each 
channel . 

HARDWARE CONSIDERATIONS 

ASYSTANT+ runs on the IBM PC family of computers, as well as on 
compatibles. The full 640KB of RAM supported by PC-DOS must be installed, 
along with an 8087 or 80287 math coprocessor, two diskette drives or one 
diskette and one hard-disk drive, and a supported graphics board. Supported 
graphics boards include the IBM Color Graphics Adapter (CGA) , the IBM EGA, 
the Hercules Graphics Card, the AT&T High-Resolution card, and the HP 
Vectra Multimode adapter. 

The program performs the basic ASYSTANT tasks without installing 
additional hardware. However, if data acquisition is to be performed, 
ASYSTANT+ does require that a data acquisition board or external data 
acquisition chassis be used. Supported data acquisition hardware includes 
the Cyborg Issac 911, the Dataq WFS-200PC Waveform Scroller, Data 
Translation's DT2800 series, IBM 1 s Data Acquisition and Control Adapter, 
the Keithley Series 500 system, metrabyte's DASH-16 board, and Tecmar's Lab 
Master and Lab Tender boards. (See "Digitizing Analog Data," Eric M. 
Miller, May 1986, p. 52 for reviews of some of these products.) 

ASYSTANT+ is a demanding program In addition to installing 640KB of 
RAM, the user must ensure that the maximum amount of RAM is 
available. TSR (terminate and stay resident) programs and device drivers 
must be kept to a minimum; the safest course is to use only the 
standard DOS configuration. 

For this article, ASYSTANT+ was tested on a Heathkit H-241 
AT-compatible computer, with 640KB of RAM, 2,176KB of extended memory, an 
80287 numeric coprocessor, a Concept Technologies ConceptBoard graphics 
adapter, and a Data Translation DT2801A data acquisition board. 



Although ASYSTANT+ can operate on a dual-diskette system, a hard disk 
• should be considered a practical requirement. Macmillan furnishes ASYSTANT+ 
on six diskettes — running the program from diskette drives requires 
frequent swapping of diskettes and severely limits file storage. 

Program configuration is an option when the program is first loaded. 
The program displays a sign-on message and then a menu with options to 
recall functions, parameters, and variables from a disk file, 
to perform hardware configuration, and to begin using the program. The 
second selection, Setup, displays a configure menu, with options for 
selecting the display, plotter, and printer, and for disk assignments for 
the system overlay, data, and help files. The initial installation of the 
program consists of copying the files from the distribution disks. 
Configuration is accomplished a the beginning of the initial session and 
can be repeated a the beginning of any subsequent session. 

ASYSTANT+ uses a Straightforward method of configuring and 
controlling a data acquisition board. However, installation of a data 
acquisition board in a typical microcomputer system may require the 
reconfiguration of other boards, the use of a nonstandard configuration of 
the data acquisition board, or the removal of other boards. Most data 
acquisition boards are designed and factory-configured* to operate in a 
standard microcomputer system, and ASYSTANT+ assumes the use of a 
factory-configured board. Microcomputers that have multiple video boards, 
high-resolution graphics boards, nonstandard mass storage device 
controllers, mice scanners, and other accessories may be difficult to 
configure. 

The program allows the specification of the I/O address of the data 
acquisition board, and most data acquisition boards can be jumpered to one 
of several addresses. Selecting an unused I/O address in a complex system 
may not be trivial, but it can be accomplished with some research. 

To provide high-performance hardware, many data acquisition board 
companies incorporate circuitry to use the computer's DMA channels, as do 
the manufacturers of hard-disk controllers, tape backup systems, optical 
scanners, network interface boards, and other high-performance accessories. 
The standard PC has four DMA channels, two of which are free for 
accessories; the XT has only one free channel to support all of the 
accessories that require DMA services. ASYSTANT+ does not use DMA, but some 
acquisition boards must be configured to use DMA, The user must pay 
attention to this issue. 

Some data acquisition boards implement a memory mapped addressing 
scheme rather than an I/O addressing scheme, using the memory above the 
base 640KB of user RAM. 

These boards, designed when it appeared that there were "holes" in 
the PC's memory map, may conflict with the EGA and other video boards or 
with other accessories that, use normally vacant segments of the memory map. 

RATING THE PERFORMANCE 

As a calculator, ASYSTANT+ is a high-performance program. Most 
computational tasks, including matrix operations, are performed almost 
instantaneously. A few of the advanced operations are slower, but still 
reasonably fast, requiring a few seconds at most. 

As a data acquisition system, ASYSTANT+ realizes the potential of the 
microcomputer. Critical elements of the program are written in assembly 
language to attain the highest possible speed of operation. However, a 
microcomputer is limited by its design as a general purpose computing 
machine. Overall system throughput is limited by the speed of the data 
acquisition board, the clock speed of the computer, and the speed with 
which data can be written to disk. ASYSTANT+ achieves its ultimate 



performance, which is essentially the performance limit of the data 
. acquisition accessory, by dedicating the host computer to controlling the 
accessory and transferring the acquired data to RAM. Graphic displays and 
disk I/O are performed between acquisition tasks. 

ASYSTANT+, a data acquisition board, and a microcomputer will not 
replace a battery of high-performance, dedicated laboratory instruments. 
Dedicated instruments are able to offer higher sampling rates, sometimes by 
factors of hundreds or thousands, than does an ASYSTANT+ data acquisition 
system. Furthermore, they provide higher accuracy and resolution. As an 
example, an HP 3852S Data Acquisition and Control System, suitably 
configured, can acquire 100,000 readings per second and store up to the 
order of 64,000 readings locally. High-performance digital storage 
oscilloscopes and waveform analyzers can acquire data at sampling rates of 
tens of millions of samples per second. Nevertheless, the ASYSTANT+ based 
system is a sound solution to the data acquisition problem. An example of 
ASYSTANT+ 1 s uses is given in the accompanying sidebar. 

It should be noted that the basic acquisition and analyzing of data 
is provided by the data acquisition hardware and not the program. The 
ambitious experimenter/programmer may be able to do quite well without 
ASYSTANT+; by writing custom software to control the- hardware . But the 
average experimenter, who must concentrate on the task a hand, will find 
that ASYSTANT+ makes configuring a comprehensive system a relatively 
straightforward procedure. Writing custom software to match ASYSTANT+ ' s 
analysis and presentation capabilities could not be done within a 
reasonable timeframe. 

THE SOFTWARE PACKAGE 

ASYSTANT+ comes with seven diskettes. The program is copy protected; 
a key diskette must be inserted in a diskette drive to load the program. An 
alternative to the key diskette arrangement is available from Macmillan in 
the form of a hardware protection device, All of the software can be copied 
to the hard disk or to the diskette drive with the DOS Copy command. 

The manual is a 2-inch, loose-leaf binder with 81/2 by 11-inch pages. 
It includes a tutorial, a reference section, several appendices, and an 
index, all separated with tabbed dividers. A hard slipcase is included. 
Both the printing and packaging are excellent. 

The tutorial is thorough and accurate. It guides the user through the 
essential features of AS YSTANT+ . Although the tutorial assumes that the 
user already has some knowledge of data acquisition, it is suitable for use 
as a refresher for occasional practitioners, or as an introduction for a 
determined beginner. The tutorial can be completed in a reasonable amount 
of time. 

The reference section is well organized, closely following the 
program's menus. It covers the simulated instruments in considerable 
detail. The user will seldom have to refer to the data acquisition hardware 
documentation if the hardware is controlled exclusively with ASYSTANT+. 

One possible drawback is that the manual is definitely not a 
mathematics textbook. The advanced math functions available in the 
calculator are summarized only briefly. Users who occasionally require 
Bessel functions and fast Fourier transforms may need to keep an assortment 
of math textbooks handy. The sister product, ASYST, provides a more 
insightful tutorial for using the mathematical functions. 

ASYSTANT+ adds realtime data acquisition capabilities to the ASYSTANT 
calculator, which rivals any general purpose computational tool, 
microcomputer-based or not, in terms of speed, ease of use, and functions. 
The data acquisition capabilities obviously do not match those of dedicated 
instruments. However, they do provide a comprehensive assortment of 



techniques for applications that can tolerate moderate sampling rates and 
provide these features at much lower cost than dedicated instruments. An 
ASYSTANT+ system is a well-balanced solution to moderate data acquisition 
needs and a high-performance solution to analysis needs. 

ASYSTANT+: $895 

Macmillan Software Company 

866 3rd Avenue 

New York, NY 10022 

212/972-3960 

CIRCLE 348 ON READER SERVICE CARD 

Victor E. Wright is the manager of process engineering at Luckett & 
Farley, located in Louisville, Kentucky. 
AN ELECTRONIC DETECTIVE 

In a practical application, ASYSTANT+ can be used as a sophisticated 
detective in an industrial plant. As an example, a plant engineer installs 
a tachometer on a components of a production line, and it produces a clean, 
square wave. However, when the tachometer is connected to the control panel 
several hundred yards away, the control panel display is greatly altered 
and meaningless. The plant engineer connects a microcomputer with a data 
acquisition board and ASYSTANT+ installed, and finds a signal like the ones 
shown in figure 1, instead of the square wave. 

The plant engineer the takes the ASYSTANT+ equipped microcomputer to 
the tachometer and measures the signal directly. As expected, its output is 
normal, the square wave shown on figure 2. Evidently, the signal is being 
degraded between the tachometer and the control panel. Because the line 
from the tachometer to the control room is routed through the plant, past 
various machines and switchgear, the plant engineer is not surprised. The 
problem and their sources. 

With the noisy signal at the control panel and the square wave 
sampled at the tachometer stored in ASYSTANT+ variables, the engineer 
is ready to begin analyzing the signal. After verifying that the square 
wave and the noisy signal samples represent the same time interval and the 
same number of data points, the engineer subtracts the square wave from the 
composite signal. Subtracting the two arrays stored in the variables 
from each other and storing the result in another variable leaves just the 
noise that is picked up in the system. The resulting waveform, plotted in 
figure 3, is still made up of several components. 

On a logical hunch, the plant engineer tries subtracting a 60 Hz sine 
wave, to remove any "power hum". After a few attempts with the wave form 
processor to get the correct amplitude, the waveform of figure 4 results. 

At this point, two components are clearly discernible, a high 
frequency sine wave riding on a lower frequency sine wave. The frequency of 
each waveform is easily determined, at least in this simplified example. 
With the frequencies of these components known, the engineer can set about 
locating their sources. For a more complicated situation, other methods 
such as plotting the power spectrum can be used. 

— Victor E. Wright 
CAPTIONS: Calculator menus, (chart); Data acquisition menu, (chart) 
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ABSTRACT: Systat Inc's $495 FASTAT 2.0 offers a moderate range of 
statistical data analysis functions to business users who have exceeded the 
capabilities of their spreadsheet but do not need the flair of heavy duty 
statistics software. Release 2.0 is a considerable improvement over the 
original Macintosh version, which placed a rudimentary shell around a 
command-line, batch-oriented statistics suite. FASTAT is half the price of 
the more powerful SYSTAT; users seeking more .than FASTAT provides must pay 
an additional $400 to upgrade. FASTAT ! s user interface remains somewhat 
clumsy. Sorting or recoding variables requires data to be saved to a new 
file, and the program can only have one file open at a time. FASTAT f s 
graphs are somewhat dull, despite promotional claims of 
presentation-quality graphics. 
TEXT : 

Inexperienced statisticians gain few advantages with SYSTAT ' s less 
costly sibling. 

Billed as "easy-to-use statistics for real world analyses," FASTAT is 
tailored for business users who have out-grown the statistical-analysis 
capabilities of their spreadsheet programs. At half the price of its more 
sophsiticated sibling, SYSTAT 5.2, FASTAT 2.0 can save you money if you 
don't require a full range of statistical procedures. 

However, we discovered that despite its scaled-down functionality, 
FASTAT offers no real ease-of-use advantages to inexperienced statisticians 
— it turns out to be every bit as difficult to use as SYSTAT. 

Family Resemblance 

Although FASTAT and SYSTAT have many of the same menu commands and 
dialog boxes in common, FASTAT doesn't provide the full range of 
statistical procedures and graph types that SYSTAT does. Still, FASTAT is 
no light-weight — the program provides a healthy assortment of statistical 
tools and graph types, including factor analyses, time-series analyses, 
regression analyses, factorial ANOVA, scatterplot matrices, 3-D spin 
plots, probability plots, and function plots. 

Both FASTAT and SYSTAT have been substantially improved since their 
initial releases. The first versions of both programs clearly demonstrated 
that they were derived from a batch-oriented command-line-based ancestor. 
The only concession to the graphical environment of the Mac was a 
surrounding shell comprising a few simple menu commands. 

With this new version of FASTAT, however, menu commands and dialog 
boxes have almost completely replaced the command-line interface. FASTAT' s 
data editor uses a familiar spreadsheetlike format. You can move, copy, 
past, and edit data sets just as you do in a worksheet. In addition, the 
company has added a palette of plot tools that lets you enhance graphs. 

Exploratory Tools 

One of FASTAT 2.0's best new features is its selection of tools for 
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* identifying specific points on scatterplot graphs. When you click on a 
% point in a graph, FASTAT highlights the corresponding record in the data 
editor, which makes it easy to isolate points for further analysis. 
Similarly, when viewing the results of an analysis, you can Option-click on 
any variable to bring up a pop-up menu that lists related graphs and 
statistics for that variable. With the addition of this feature, FASTAT 
becomes more versatile — it can serve not only as a traditional 
hypothesis-testing tool but also as an exploratory-analysis tool. 

For those of you who frequently work with data comprising many 
variables, FASTAT 1 s Define Bundles command is another plus. By letting you 
define any variable subset as a bundle, it eliminates the need to scroll 
through all variable names to find the ones you want. When you select the 
bundle, only its variables appear in the variable-selection 
list. You can define as many as five bundles and shift among them by 
clicking on the bundle icon. 

Despite these improvements, FASTAT is still cumbersome. For example, 
sorting and recoding variables often require you to save the converted data 
to a separate file. To make matters worse, you can't have more than one 
file open at a time. 
No Hot Links 

When FASTAT completes an analysis, it places the results in a window. 
But the window is not hot-linked to the data editor, so you must run a new 
analysis each time you make a change in the data set. This makes it 
difficult to compare the effects of adding or deleting data elements. 
Similarly, to make even a minor modification to a graph (such as adding a 
best-fitting regression line to a scatterplot), you must recreate the 
entire analysis from scratch. 

Equally irritating, the results of ANOVA analyses completely 
disappeared when we requested related supplementary analysis. This forced 
us to redo the initial ANOVA analysis each time we wanted to try an 
additional supplementary analysis. 

If you make a mistake, that's just too bad -- FASTAT f s Undo command 
is rarely active when you need it most. FASTAT does provide extensive 
context-sensitive on-line help, however, including a general Help window; 
an Information window, which provides in-context definitions of terms; a 
Balloon Help-like feature that explains each menu command; and mini help 
messages in every dialog box. 

Unfortunately, FASTAT ' s manual is not as impressive as the on-line 
help. For those who are already familiar with statistical procedures, the 
manual is adequate. But for those users who require a tutorial to help them 
learn unfamiliar procedures, it falls far short. Here's an example: 
Although the manual instructs you to select the MGLH (multivariate general 
linear hypothesis) option to access FASTAT 1 s regression and ANOVA commands, 
it never explains MGLH. Ironically, the SYSTAT manual devotes an entire 
chapter to the meaning of this term. 

Also, if you want to add a legend to a graph, you must first select 
the Symbol dialog box and assign separate symbols to each variable. The 
manual makes no mention of this requirement. Furthermore, the Symbol dialog 
box requires you to enter numbers that correspond to the symbols -- you 
can't simply click on the symbols themselves. 

Another weakness is in the area of graphing. Although the FASTAT 
package promotes the program's "presentation-quality graphics," we found 
the quality not to be on a par with that of other statistical packages in 
FASTAT' s price range. 

In order to get data into FASTAT, you can either enter it directly 
into a data-editor worksheet, import it as a text file, or import it from 
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. The Bottom Line 

The latest release of FAS TAT is a big improvement over the previous 
version. Its interface is significantly enhanced, although there's still 
more work to be done before FAS TAT fully exploits the Mac's graphical 
abilities. Improvements aside, FASTAT remains difficult to use. 

FASTAT 2.0 provides basic statistical procedures and graphing options 
and several more-advanced techniques at half the price of its more powerful 
sibling, SYSTAT. Still, in our view, it would be a better strategy for the 
company to offer a single midrange program, such as FASTAT, and make 
more-advanced features available as separate, optional plug-in modules. As 
it is now, FASTAT users who need more-advanced features must lay out an 
additional $400 to upgrade to SYSTAT, which duplicates many of the features 
they already have in FASTAT. 

Compared with other midrange statistical programs for the Mac, FASTAT 
has an interface that puts it at a significant disadvantage. StatView, for 
example, has a well-designed interface and statistical power that's 
comparable to FASTAT 1 s . And Data Desk and JMP are better choices than 
FASTAT if your main requirement is exploratory data analysis. 

If your statitical demands are relatively - light , a spreadsheet 
program may be all you need. Excel 4.0, in particular, has beefed up its 
statistical-analysis power by providing ANOVA and regression tools. By and 
large, spreadsheet programs offer more-flexible data entry features and 
more-attractive graphs than statistics programs do. 

SYSTAT, however, remains the first choice for those who need the most 
complete range of sophisticated statistical-analysis procedures and graph 
types . 
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ABSTRACT EP 514122 A2 

A scheduling system and method for use with training systems. The 
exemplary embodiment of the scheduler is incorporated into an aircrew 
training system for a military aircraft. A training system for training 
aircrews involves the use of academic media such as classrooms, training 
devices such as ground-based flight simulation trainers, and training 
flights in the air. In addition, it involves a computer network having 
terminals located at a central site, a plurality of training sites, and 
other remote sites The computer data base is located at a central site, 
and the training facilities are located at training sites. Typically, 
computer terminals are connected together in a computer network by both 
dedicated and dial-up telephone lines, and typically the network may 
employ Intel 80386 machines running UNIX V, release 3.2. The scheduler of 
the present invention comprises an integrated system of hardware and 
software which is integrated into the already existing training system. 
It is embedded as a software subsystem in the training system, and is 
delivered on a type 80386 integrated circuit based computer element at 
each training site, (see image in original document) 
CLAIMS EP 514122 A3 

1. An expert system scheduler for flexibly scheduling training events at 
a plurality of training sites notwithstanding the occurrence of 
resource conflicts, each training site comprises one of a plurality 
of distributed computers that are interconnected by means of an 
interconnecting link, the plurality of distributed computers 
interconnected to a central processor including a database, and 
wherein the remainder of the computers comprise remote processors, 
and wherein the database comprises: (1) a list of students input from 
the plurality of remote computers, (2) a list of instructors, and (3) 
a list of available flight training events, and wherein the 
availability of the students, the instructors and the available 
training events vary over time, and wherein the expert system 
scheduler comprises processing means that are disposed on each of the 
remote processors, said expert system scheduler comprising: 

means for selectively generating a master plan in response to 
training requests supplied by users, which master plan provides an 
event flow that specifies target dates for each training event, but 
does not specify the exact time or resources and does not take into 
account whether sufficient resources are available on a target date, 
which training requests inform the scheduler that a specific number 
of users should be scheduled for a particular training event, and 
stipulate required starting and ending dates for the events, the 
master plan providing users with a preview of the proposed event 
sequence and an overview of all events which are targeted for the 
same date; 
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means for adjusting the master plan so that users may adjust 
starting, interim and ending training dates in order to express 
preferred scheduling constraints; 

means for selectively generating a master schedule in response 
to training requests and the users preferred scheduling constraints 
which reserves specific dates, times, locations and resources for 
each training event to fulfill scheduled training requests; 

means for generating conflict alerts to notify users if 
conflicts with the master schedule exist; 

means for generating revised training requests in response to 
conflict alerts; and 

means for automatically generating schedule revision options in 
response to the revised training requests which appropriately 
reschedule the sites in view of conflicts; 

wherein the expert system scheduler flexibly schedules and 
reschedules training events at each of the sites notwithstanding 
resource conflicts, and wherein schedules are automatically generated 
and conflicts resolved. 

2. The expert system scheduler of Claim 1 which further comprises: 

means for selectively generating schedules indexed on user, 
instructor, resource, and event in response to user requests. 

3. The expert system scheduler of Claim 1 which further comprises: 

means for displaying the master plan to provide facilities for 
users to review and revise the master plan. 

4. The expert system scheduler of Claim 1 which further comprises: 

means for initializing a workshift calendar and a scheduling 
parameters database that stipulate general scheduling 
constraints, which parameters are used by the scheduler to 
restrict the dates and times during which resources may be used and 
to establish priorities by which resources are allocated. 

5. An expert system scheduler for flexibly scheduling training events at 
a plurality of training sites notwithstanding the occurrence of 
resource conflicts, each training site comprises one of a plurality 
of distributed computers that are interconnected by means of an 
interconnecting link, the plurality of distributed computers 
interconnected to a central processor including a database, and 
wherein the remainder of the computers comprise remote processors, 
and wherein the database comprises: (1) a list of students input from 
the plurality of remote computers, (2) a list of instructors, and (3) 
a list of available flight training events, and wherein the 
availability of the students, the instructors and the available 
training events vary over time, and wherein the expert system 
scheduler comprises processing means that are disposed on each of the 
remote processors, said expert system scheduler comprising: 

means for selectively generating a master plan in response to 
training requests supplied by users, which master plan provides an 
event flow that specifies target dates for each training event, but 
does not specify the exact time or resources and does not take into 
account whether sufficient resources are available on a target date, 
which training requests inform the scheduler that a specific number 
of users should be scheduled for a particular training event, and 
stipulate required starting and ending dates for the events, the 
master plan providing users with a preview of the proposed event 
sequence and an overview of all events which are targeted for the 
same date; 
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Abstract of EP1 065827 

On a transaction network that supports short-duration electronic transactions within multiple service classes between 
input terminals and host processors, such as for credit card purchases, a network anomaly detector monitors the 
network to determine a potential fault either on or off the network before an actual network failure occurs. The network 
anomaly detector is provided with current transaction data for ongoing transactions, which data for each transaction 
includes the service class of the transaction, the start time of the transaction and the duration of the transaction. The 
current transaction data is converted to a traffic intensity, which provides a temporal measure of the traffic on the 
network within each predetermined binning interval for each service class. For each service class, that binning 
interval is computed by the detector as a function of the median of the durations of transactions having the same 
service class from past transaction data so that a large percentage of transactions would statistically be expected to 
have a duration less one interval. That binning interval is also used to convert historical transaction data for each 
class into temporal upper and lower traffic intensity thresholds. If the traffic intensity generated from current data 
exceeds the upper threshold or falls below the lower threshold by longer than a predetermined time, an alarm is 
sounded to indicate an anomaly. Corrective action can then be taken to remove the anomalous condition. 
Periodically, the historical data used to generate the upper and lower thresholds is updated with more recent 
transaction data so that the thresholds more closely follow current data trends. 
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Abstract of WO02093292 

The present invention proposes a method for 
calculating, presenting, and combining probabilistic 
functional maps (PFMs) of the human brain 
representing the probability of structures existing. The 
method comprises three major steps: reading of data 
containing the coordinates of contacts, calculating the 
PFMs, and presenting he PFMs. The data can be read 
from a file in text or binary format or from a database 
as local or remote client. The PFM calculation 
comprises the following steps: forming 3D models of 
contacts, normalizing the contact models, voxelizing 
the contact models, calculating an atlas function, and 
calculating the PFM. The PFM can be presented alone 
or along with anatomical atlases. Both 3D and 2D 
interfaces can be used for presentation. The proposed 
method also includes different ways of combining the 
contact data and/or existing PFMs from multiple 
sources. This mechanism is the basis of an internet 
portal for stereotactic and functional neurosurgery. 
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Abstract of WO02095633 

Disclosed are techniques used in connection with 
determining a health indicator (HI) of a component, 
such as that of an aircraft component. The HI is 
determined using condition indicators (CIs) which 
parameterize characteristics about a component 
minimizing possibility of a false alarm. Different 
algorithms are disclosed which may be used in 
determining one or more CIs. The HI may be 
determined using a normalized CI value. Techniques 
are also described in connection with selecting 
particular CIs that provide for maximizing separation 
between HI classifications. Given a particular HI at a 
point in time for a component, techniques are 
described for predicting a future state or health of the 
component using the Kalman filter. Techniques are 
described for estimating data values as an alternative 
to performing data acquisitions, as may be used when 
there is no pre-existing data. 
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Preface 



How to Use This Guide 

This guide introduces the MATLAB statistics environment through the toolbox 
functions. It describes the functions with regard to particular areas of interest, 
such as probability distributions, linear and nonlinear models, principal 
components analysis, design of experiments, statistical process control, and 
descriptive statistics. It also describes use of the graphical tools. 



"Introduction" Introduces the toolbox, and explains the mathematical notation it 

uses. 

"Probability Distributions" Describes the distributions and the distribution-related functions 

supported by the toolbox. 

"Descriptive Statistics" Explores toolbox features for working with descriptive statistics 

such as measures of location and spread, percentile estimates, and 
data with missing values. 

"Linear Models" Describes toolbox support for one-way, two-way, and higher-way 

analysis of variance (ANOVA), analysis of covariance 
(ANOCOVA), multiple linear regression, stepwise regression, 
response surface prediction, ridge regression, and one-way 
multivariate analysis of variance (MANOVA). It also describes 
support for nonparametric versions of one- and two-way ANOVA, 
and multiple comparisons of the estimates produced by ANOVA 
and ANOCOVA functions. 

Discusses parameter estimation, interactive prediction and 
visualization of multidimensional nonlinear fits, and confidence 
intervals for parameters and predicted values. It also introduces 
classification and regression trees as a way to approximate a 
regression relationship. 

Describes support for common tests of hypothesis - t-tests, 
Z-tests, nonparametric tests, and distribution tests. 

"Multivariate Statistics" Explores toolbox features that support methods in multivariate 

statistics, including principal components analysis, factor 
analysis, one-way multivariate analysis of variance, cluster 
analysis, and classical multidimensional scaling. 



"Nonlinear Regression Models" 



"Hypothesis Tests" 
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"Statistical Plots" 



"Statistical Process Control" 



"Design of Experiments" 



"Demos" 



"Reference" 



"Selected Bibliography" 



Describes box plots, normal probability plots, Weibull probability 
plots, control charts, and quantile-quantile plots which the toolbox 
adds to the arsenal of graphs in MATLAB. It also discusses 
extended support for polynomial curve fitting and prediction, 
creation of scatter plots or matrices of scatter plots for grouped 
data, interactive identification of points on such plots, and 
interactive exploration of a fitted regression model. 

Discusses the plotting of common control charts and the 
performing of process capability studies. 

Discusses toolbox support for full and fractional factorial designs, 
response surface designs, and D-optimal designs. It also describes 
functions for generating designs, augmenting designs, and 
optimally assigning units with fixed covariates. 

Describes GUIs that enable you to explore the probability 
distributions, random number generation, curve fitting, and 
design of experiments functions. 

Lists the functions for each area supported by the toolbox, and 
provides a complete description of each function. 

Lists published materials that support concepts described in this 
guide. 



Information about specific functions and tools is available online and in the 
PDF version of this document. For functions and graphical tools, reference 
descriptions include a synopsis of the syntax, as well as a complete explanation 
of options and operation. Many reference descriptions also include examples, a 
description of the function's algorithm, and references to additional reading 
material. "Demos" on page 11-1 further describes the use of the graphical tools. 



Preface 



Related Products List 



The Math Works provides several products that may be relevant to the kinds of 
tasks you can perform with the Statistics Toolbox. 

For more information about any of these products, see either: 

• The online documentation for that product if it is installed or if you are 
reading the documentation from the CD 

• The Math Works Web site, at http : / /www. mathworks . com; see the "products" 
section 



Note The toolboxes listed below all include functions that extend the 
MATLAB capabilities. The blocksets all include blocks that extend Simulink's 
capabilities. 



Product 



Description 



Data Acquisition Toolbox 



Database Toolbox 

Financial Time Series 
Toolbox 

Financial Toolbox 



GARCH Toolbox 



Image Processing 
Toolbox 

Mapping Toolbox 



Acquire and send out data from plug-in data 
acquisition boards 

Exchange data with relational databases 
Analyze and manage financial time series data 

Model financial data and develop financial 
analysis algorithms 

Analyze financial volatility using univariate 
GARCH models 

Perform image processing, analysis, and 
algorithm development 

Analyze and visualize geographically based 
information 
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Product 


Description 


Neural Network Toolbox 


Design and simulate neural networks 


Optimization Toolbox 


Solve standard and large-scale optimization 




problems 


Signal Processing 


Perform signal processing, analysis, and 


Toolbox 


algorithm development 


System Identification 


Create linear dynamic models from measured 


Toolbox 


input-output data 
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Typographical Conventions 

This manual uses some or all of these conventions. 



Item 

Example code 



Function names, syntax, 
filenames, directory/folder 
names, and user input 

Buttons and keys 

Literal strings (in syntax 
descriptions in reference 
chapters) 

Mathematical 
expressions 

MATLAB output 



Menu and dialog box titles 

New terms and for 
emphasis 

Omitted input arguments 



Convention 

Monospace font 

Monospace font 

Boldface with book title caps 
Monospace bold for literals 

Italics for variables 

Standard text font for functions, 
operators, and constants 

Monospace font 

Boldface with book title caps 
Italics 

(...) ellipsis denotes all of the 
input/output arguments from 
preceding syntaxes. 



Example 

To assign the value 5 to A, 
enter 

A = 5 

The cos function finds the 
cosine of each array element. 

Syntax line example is 

MLGetVar ML_var_name 

Press the Enter key. 

f = freqspace(n, 'whole' ) 

This vector represents the 
polynomial p = x 2 + 2x + 3. 

MATLAB responds with 
A = 

5 

Choose the File Options 
menu. 

An array is an ordered 
collection of information. 

[c,ia, ib] = union( . . . ) 



sysc = d2c(sysd, 'me thod 1 ) 



String variables (from a Monospace italics 
finite list) 



Italics for variables 



Italics 
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What Is the Statistics Toolbox? 

The Statistics Toolbox, for use with MATLAB, supplies basic statistics 
capability on the level of a first course in engineering or scientific statistics. 
The statistics functions it provides are building blocks suitable for use inside 
other analytical tools. 

The Statistics Toolbox is a collection of tools built on the MATLAB numeric 
computing environment. The toolbox supports a wide range of common 
statistical tasks, from random number generation, to curve fitting, to design of 
experiments and statistical process control. The toolbox provides two 
categories of tools: 

• Building-block probability and statistics functions 

• Graphical, interactive tools 

The first category of tools is made up of functions that you can call from the 
command line or from your own applications. Many of these functions are 
MATLAB M-files, series of MATLAB statements that implement specialized 
statistics algorithms. You can view the MATLAB code for these functions using 
the statement 

type function_name 

You can change the way any toolbox function works by copying and renaming 
the M-file, then modifying your copy. You can also extend the toolbox by adding 
your own M-files. 

Secondly, the toolbox provides a number of interactive tools that let you access 
many of the functions through a graphical user interface (GUI). Together, the 
GUI-based tools provide an environment for polynomial fitting and prediction, 
as well as probability function exploration. 
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Primary Topic Areas 

The Statistics Toolbox has more than 200 M-files, supporting work in these 
topical areas: 

Probability Distributions 

The Statistics Toolbox supports 20 probability distributions. For each 
distribution there are five associated functions. They are 

• Probability density function (pdf) 

• Cumulative distribution function (cdf) 

• Inverse of the cumulative distribution function 

• Random number generator 

• Mean and variance as a function of the parameters 

For data-driven distributions (beta, binomial, exponential, gamma, normal, 
Poisson, uniform, and Weibull), the Statistics Toolbox has functions for 
computing parameter estimates and confidence intervals. 

Descriptive Statistics 

The Statistics Toolbox provides functions for describing the features of a data 
sample. These descriptive statistics include measures of location and spread, 
percentile estimates and functions for dealing with data having missing 
values. 

Linear Models 

In the area of linear models, the Statistics Toolbox supports one-way, two-way, 
and higher-way analysis of variance (ANOVA), analysis of covariance 
(ANOCOVA), multiple linear regression, stepwise regression, response surface 
prediction, ridge regression, and one-way multivariate analysis of variance 
(MANOVA). It supports nonparametric versions of one- and two-way ANOVA. 
It also supports multiple comparisons of the estimates produced by ANOVA 
and ANOCOVA functions. 

Nonlinear Models 

For nonlinear models, the Statistics Toolbox provides functions for parameter 
estimation, interactive prediction and visualization of multidimensional 
nonlinear fits, and confidence intervals for parameters and predicted values. It 
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provides functions for using classification and regression trees to approximate 
regression relationships. 

Hypothesis Tests 

The Statistics Toolbox also provides functions that do the most common tests 
of hypothesis — t-tests, Z-tests, nonparametric tests, and distribution tests. 

Multivariate Statistics 

The Statistics Toolbox supports methods in multivariate statistics, including 
principal components analysis, factor analysis, one-way multivariate analysis 
of variance, cluster analysis, and classical multidimensional scaling. 

Statistical Plots 

The Statistics Toolbox adds box plots, normal probability plots, Weibull 
probability plots, control charts, and quantile-quantile plots to the arsenal of 
graphs in MATLAB. There is also extended support for polynomial curve fitting 
and prediction. There are functions to create scatter plots or matrices of scatter 
plots for grouped data, and to identify points interactively on such plots. There 
is a function to interactively explore a fitted regression model. 

Statistical Process Control (SPC) 

For SPC, the Statistics Toolbox provides functions for plotting common control 
charts and performing process capability studies. 

Design of Experiments (DOE) 

The Statistics Toolbox supports full and fractional factorial designs, response 
surface designs, and D-optimal designs. There are functions for generating 
designs, augmenting designs, and optimally assigning units with fixed 
covariates. 
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Chapter III. Statistical analysis of sequences 



Here we are taking the statistical analysis of sequences (that we started when considering 
the significance of the alignment) one step further. The high point of the discussion is the 
most popular alignment algorithm: BLAST (Basic Local Alignment Search Tool). The 
theory of BLAST is quite complex and full account of the theory is beyond the scope of 
the present discussion. Nevertheless, we will outline the main ideas and the formulas that 
are used. We start with yet another look at the statistical argument that we used when 
evaluating the significance of the scores. 

n 

The score of an alignment T = ^^(^ ,6, ) is a sum of individual entries to a substitution 

matrix, substitutions that may include gaps. The BLAST algorithm suggests an 
approximate statistical estimate for the significance of the above score. Before going into 
an in depth discussion on BLAST it is useful to outline the conceptual analogy to the 
Z score analysis and the relative advantages and disadvantages of the two approaches. In 
the Z score formulation we test if the score is significantly higher than a typical score of a 
random sequence. Alternatively, we ask what is the probability to obtain by chance a 
Z score higher than a threshold value Z th ? By "high score by chance" we mean a high 
scoring alignment of a probe sequence against a random sequence sampled from a 
distribution of individual amino acids, p{a f ) . 

Let the probability of observing a Z score between Z and Z + dZ by chance be 
p z (Z)dZ ( p(Z) is the probability density). The answer to the above question is 

00 

therefore P z (Z>Z th ) = jp z (Z)dZ 

The smaller is P z (Z > Z th ) the less likely it is that the observed score Z was obtained by 
chance. This is in principle a simple test for significance. However, there is a 
complication in practice, which is the unknown functional form of p{Z) . A possible 

solution to the problem is based on numerical experiments. We may compute a large 
sample of alignments of pairs of sequences that are not related. The sample will be used 
to estimate the probability density (e.g. by counting the number of times that we observed 
Z scores between Z and Z + AZ , n(Z) , dividing it by the total number of alignments 
TV , and by the interval, AZ ). 

The consideration of the dimensionless entity, Z , versus the direct score, T , is especially 
useful in estimates of the probability density. The reference to a single variable, which 
does not depend strongly on the sequence length, makes the numerical estimates of the 
model easier and more straightforward. For example, the score, T , clearly depends on the 
length and so is the probability density p T (T). 
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Note that in the numerical evaluation described above we use the term "unrelated 
sequences". These are not necessarily "random" sequences sampled from distribution of 
individual amino acids but are true protein sequences that are unrelated to the probe 
sequence. In other words we changed the reference or the background distribution to 
reflect the distribution of real proteins. The use of random sequences enters only in the 
second phase when we evaluate the Z score of one alignment (a pair of sequences). One 
of the protein sequences undergoes shuffling (randomization of the positions of the amino 
acids) and an optimal score is computed between the probe and the random sequences. 
The optimal scores of alignments against random sequences (derived from a single match 
of the probe sequence into one of the sequences in the database) are used to compute one 
Z score. 

The distribution function, P{Z > Z th ) , can be computed only once prior to any 

calculation and used ever after. Besides the probability of the prediction being a false 
positive it is also possible to estimate the probability of being true positive. This can be 
done only by numerical experiments since there is no analytical theory for true positives.. 
For that purpose we compute the distribution function of Z scores of alignments between 
related sequences. That is, we ask the double question (and provide numerical estimate): 
What is the probability that the computed Z score is a false positive and at the same time 
is a true positive? We hope that the answer to the first question is very low and the 
answer to the second question is very high. 

Ideally the distribution of the false positive and true positive will have zero overlap. 




Figure X: Schematic drawing of overlapping distribution of false and true positives. Ideally we should have 
a score that is "only" true or "only" false. Typically we accept some probability of false positives to 

minimize the lost of true positives. The score Z th determines the selection boundary. 

In practice, however, this is not the case. The choice of the threshold score Z ih that we 
use to decide if to accept the prediction as true is done to minimize P{Z> Z th ) and 
maximize P true (Z > Z th ) . Clearly, the two functions provide complementary information. 

The above procedure that is based on numerical calculations of the distribution functions 
for false and true positives and the careful selection of a threshold value is very reliable. 
However, (and this is a BIG however), the process is expensive and is difficult to use in 
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large-scale prediction schemes. The reason is not the calculation of the distribution 
function that is done only once and used ever after but the calculation of the Z score 
itself. Each calculation of a Z score requires the alignment of tens to a thousand of 
random sequences. If a single alignment costs about 0.01 to 0.1 second on a 700 MHz PC 
then a comparison of two sequences (including the Z score estimate) will take tens of 
seconds. And a search of one sequence against a small database of (say) ten thousand 
proteins will take about a day. Of course large-scale comparisons between many genes 
against larger databases becomes impossible with such timing. One way of reducing the 
computational efforts is to compute the Z scores only for high scoring alignments. 
However, in order not to miss true positive we still need to examine many high score 
alignments and the relief provided by the above heuristic is limited. 

If the task at hand is of genomic scale analysis, namely, the study of ten to hundreds of 
millions of alignments, then even dynamic programming (computing only the T scores) 
can be too expensive. 

An intermediate conclusion is therefore that the statistical arguments so far have led to 
more reliable predictions but not to more efficient calculations. 

It is possible to use the general idea of statistical significance to design a more efficient 
alignment algorithm. The twist here is not to check the statistical significance of an 
optimal alignment that was obtained by dynamic programming, but to create an 
approximate alignment using statistical consideration. We design an approximate 
alignment procedure that will pick alignments with high statistical significance. As 
explained below the resulting algorithm is considerably more efficient than dynamic 
programming at the expense of using approximations. On the hand, the incorporation of 
statistical arguments into the alignment procedure makes the final decision, (true or false 
positive?), better than a decision that is based only on the T score. Hence even if the 
alignment is not optimal the assessment that the two sequences are indeed related by 
statistical significance is typically pretty good. 

We consider first the score T of an alignment and the probability density of the score 
p(T) . The T score is considerably less expensive to compute compared to the Z score, 

and in that sense it is more attractive. However, the T score depends strongly on a 
number of parameters, for example, the sequence length. It is necessary to develop a 
theory that will examine the dependence of the score T on different alignment 
parameters. This is one achievement of the BLAST algorithm: the development of a 
statistical theory of the T scores. We shall discuss the theory later after understanding 
how the efficiency of match finding is achieved. 

Even if we have an exact theory of the statistical significance of a score (and we do not, 
the BLAST theory is approximate), we still need to select (efficiently) a high scoring 
alignment in order to assess its significance. A clever idea of BLAST is to perform a 
search for high scoring short segments using gapless local alignments. The statistical 
significance test makes it possible to estimate if the short matches are meaningful and 
worth exploring further. 
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Efficient scanning of sequences in BLAST 

Consider for example the short segment WWWW that is found in both the probe 
sequence and one of the sequences in the database. Even though the match is found in 
fragments that are short (and typically shorter segments are less significant), here it is 
likely to be significant. Tryptophan is a rare residue, which is unlikely to be mutated by 
another. Therefore, if we have a match for four tryptophans the match is unlikely to be by 
chance, and is more likely to indicate true relationship between the two sequences. Note 
that these short segments for which we find matches need not be identical. For example 
in the above example we may consider WFWW as also a match using scores from the 
usual substitution matrices. The quantification of "likely" and "unlikely" is at the core of 
the BLAST statistical estimates. Let us accept for the moment that we can quantify the 
statistical significance of matching of short segments and consider the problem of 
efficiency. 

Matches for short segments can be search efficiently. Many technical adjustments to the 
idea we describe below are possible, however for simOlicity we focused on the most 
obvious solution rather than on the most efficient. 

One simple idea is to use hash tables and to pre-process the database of annotated 
proteins (we consider now the problem of seeking a match of an unknown sequence 
against a large database). Consider a segment of length four. There are 20 4 =160000 
possible different segments. This number is large but not impossible. We prepare pointers 
for all possible four characters of the probe sequence. The database is scanned (number 
of operations of order of O(N) where N is the size of the database), and every fragment 
of length 4 of the database is immediately tested against the probe sequence using the 
pointers. We comment that with advance hard disks with rapid access or large memory it 
is possible to preprocess the entire database and to arrange pointers to the locations of all 
fragments in the database. In that case the probe sequence analysis will include the 
calculation of the pointers that will immediately bring us to the matches at the large 
database. The number of operation is therefore O(L) where L is the length of the probe 

sequence! Sounds great. Nevertheless, the limiting factor in this case may be the rate of 
disk access. 

The pointer is not limited to identical matches but can also point to all other possible 
matches that score above a certain threshold T T . Clearly a high threshold will make our 
life considerably simpler since only a relatively small number of matches will be found 
that will require further examination in the next step. However, the small number of 
matches will make the next phase of extending the match, considerably more difficult. 
The choice of the threshold T T is a compromise. 

Once high scoring segments were identified (hopefully their number is not too large. . .) 
the next step is to try to extend them beyond the pre-determined size of a fragment (in our 
discussion so far it was 4) while maintaining the significance of the (high scoring) 
alignment. It is important to emphasize that we are left now with considerably smaller 
number of sequence pairs to probe, which makes the efficient execution of the first step 
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even more important. The extension of the high scoring fragment can be made (again for 
example) using dynamic programming and gaps, attempting to link more than one high 
scoring segment. Hence, it is useful to examine not only individual high scoring segments 
but also to consider (or put even higher significance) those that are close to each other. 
The disadvantage of using dynamics programming is the slowing down of the search 
procedure. In practice direct extension of the matched parts (no gaps) seems to detect 
sequence relationships quite efficiently, so it is not obvious if the (expensive) 
implementation of dynamic programming was worth it. 

It is clear that the cost of the second step should depend only weakly on the database size 
(the number of potential matches that we find will depend on the database size). As a 
result BLAST searches are efficient. 

Brief statement of BLAST statistical framework 

The theory behind BLAST that we shall considers next provide us with an estimate of the 
probability that two random sequences of length n and m will score more than T T . To 
make our match unusual and more likely to be biologically significance this probability 
better be small. 

In the present approach we restrict ourselves to local alignments only 

We consider the score, T , aligning two sequences A = a v ..a n and B = b v ..b n , 

n 

T = ^s(a i9 b f ) where the matrix elements, s{a i9 b t ) , are the appropriate entries to the 

i 

BLOSUM matrix. We assume that the entries are uncorrelated and therefore the score T 
is a sum of uncorrelated random numbers that are sampled from the same probability 
function. We wish to determine the probability that an observed score T obs was obtained 
by chance. 

It is useful to think on the score as a random walk in which the change from 7^ to T i+y are 

the changes induced by one step of the walker. Since the alignment we consider is local 
the length of the walk is not predetermined to begin with, and nor is the score. We 
terminate the alignment when further build-up of the alignment does not seem to be 
helpful. In our case it is when T i reaches a negative value (-1). Previously we terminate at 

the value of zero. The choice of different (low) termination values depends on the choice 
of the substitution matrix. At the least we require that the average value of the 

substitution matrix (over all elements), (s(a,b)) = ^p Q p b s{a,b), is negative. The 

probabilities p a or p b are the "background" probabilities for individual amino acids. The 
average should be 0 since for sufficiently large n the score of the alignment is roughly 
T « n -{s^a.b)) . To ensure that the length of the alignment is finite the average of the 

substitution matrix, must be negative, otherwise the score and the length may 

grow to infinite. 
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Termination (-1) 

A schematic presentation of a random walk that represent a (random) alignment. The alignment starts at 
zero and then terminates when it reaches the value of -1 . No termination for an upper bound is assumed, 
however, since the substitution matrix is negative on the average, the alignment should terminate at finite 
length n . 

In BLAST we address the following questions: 

(i) What is the probability of obtaining a maximum score of an alignment, T T , by 
chance before the alignment reaches -1 (i.e. what is the probability that the 
alignment is not significant). 

(ii) What is the distribution of alignment lengths before they are terminated (by 
hitting the absorbing boundary at -1) 

We will not follow the theory in all its glory, since some of the arguments are too 
complex to be included in the present discussion. However, we will outline a few simple 
examples demonstrating the main idea behind the BLAST approach. Before continuing 
we provide first the main result of the statistical theory. 

The probability to obtain by chance a score T larger or equal to T T is 



It is expected that the maximal score by chance will depend on the length of the 
sequence, and indeed m and n are the lengths of the two sequences. There are two 
parameters in the theory, K and X , that require further discussion. For example X , 
which is a simpler parameter, is determined by the expression 



P(T>T T ) = \-e- y 



where y = K- m-n exp[-XT T ] 




Hence, the parameter, X , determines the scale of the substitution matrix. 
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It is useful to think on the score as a result of a random walk. In that case we may ask 
what is the probability that the walk will be terminated at a given upper bound instead of 
a lower bound. Hence we consider a random walk between two absorbing boundaries. 
The lower bound is a and the upper boundary is b (note that in BLAST we consider 
only the lower boundary a which is set to -1 . The upper boundary here is added for 
convenience). We start our walk in the space of scores at zero. When no amino acids are 
aligned against each other (the beginning) then the total score is zero by definition. 




> L 



A schematic presentation of a walk in score space. We always start at zero and terminate either at the lower 
boundary a , or the upper boundary b . We ask what is the probability f 0 that the walk will terminate at 

b and not on a . Clearly the higher is b (keeping a fixed) the lower is the probability of hitting b 
before a . 

The probability that an alignment will reach a maximum value 

We consider a walk (extension of the length of the alignment) that starts at score zero and 
at length zero. The probability and the magnitude of a step (an element of the BLOSUM 
matrix for a pair of amino acids) have significant variations for real data. However, to 
keep the discussion below simple we consider a model in which only steps of ±1 are 
allowed. Of course there are many more possibilities for real data but we can still imagine 
dividing the amino acids into two groups: hydrophobic H and hydrophilic P. If the pair 
under consideration is of the same type (i.e. H/H or P/P), then the score is set to +1, if it 
is a miss (H/P or P/H) then the score is -1 . Since the H/P model was used successfully in 
a number of simplified and semi-quantitative models, it is expected to work similarly in 
the present case. 

(Note however, that there is a fundamental problem in the above suggestion if all the 
pairs are equally probable. A possible correction is to set the score of P/P to zero. Can 
you explain why?) 

The probability of going a step up is p and a step down is q . Let f i be the probability 

that the walk terminates at the upper bound b instead of the lower bound a starting from 
position i . After a single step we may reach with probability p the / + 1 position and 
with a probability q the /-l position. Since the probability of termination at the upper 
boundary should conserve, we have 

fi=Pf M +tfi-\ 
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with the boundary conditions: f a = 0 and f b = 1 . 

As usual it is easy to guess a solution of the type f. = exp[/ for the above 

homogenous equation. We have 

exp [/■#] = /? exp [(/ + l)#] + <7 exp [(/" - 1 ) #] 

multiplying on both sides by exp[(l-/)#] we have 

/?exp[20]-exp[0] + <? = O 

for p * q we have two solutions: 0 = 0 and 0 = log (q/ p) . The general solution is a 
linear combination of the two, which is 
/ = 4+ A 2 exp[i-log(q/ »] 

The coefficient A ] and A 2 are determined by the boundary conditions f a =0 and f b = 1 
We write the solution below for the probability starting at zero. 

j = l-exp(aTog(g//?)) 

exp (b • log (q I p)) - exp (a • log (q I p)) 
For sufficiently high b (b □ a ) we have (we also set a = -1 ) 
/o =0 ex P (-*' lo 8 

Note that we already have a condition: ^ > p . Explained. Without this condition the 
results of the last equation will be meaningless. The probability of hitting an upper 
boundary is exponentially small. Of course this is what we should expect from a random 
walk. In sequence alignments it means that high scoring segments can easily have 
vanishing small probabilities of being false positive. This probability is the geometric 
distribution and makes the core of the statistical arguments about random alignments. 

Length of an alignment 

Another fun question that we can ask is what is the typical length of the alignment. 
Assume that we are at position / like before and the number of steps that the walk takes 
(before terminating either at -1 , or at b ) is L t . After one step the walk will be at position 
/ - 1 with a probability q and at position / + 1 with a probability p . The length of 
remaining walk after this single step is L k -\ . Summarizing in a difference equation we 
have 
1,-1 = 

Since the equation now is inhomogeneous, the solution is a linear combination of the 
homogenous solutions (that we saw before) and special solution. We have 

A =^-j^ + 4 +A 2 exp[i-log(q/p)] 

If the walk starts at a = -1 or at b hen it is terminated immediately and the walk length 
is zero. We therefore have the conditions L a =L b =0, which are sufficient to obtain the 
final solution for the length of the alignment starting at zero. 
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The extreme value distribution 

A single alignment may include more than a single maximum before hitting the 
termination point. Alignment (and a random walk) can go up and down several times 
before reaching the absorbing boundary a . We usually pick the local alignment with the 
largest score and trace it back from that position. It is therefore insufficient to find an 
arbitrary maximum of the alignment. We need instead to determine the maximum of the 
maxima. Consideration of the probability of the maximum of the maxima requires a little 
bit of more work that is outlined below. 

Consider a set of N random numbers that are independent and sampled from the same 
distribution function - {T l9 T 29 ... 9 T N } . With respect to the alignments we assume that the 
maxima along the walk are independent. One of these numbers is larger than the rest, and 
we call it T mM . The random numbers (local maxima of the alignment) T { -s are assumed 

for simplicity to be continuous and sampled from a probability density, p(T) ; 

00 

T g [-oo 5 oo] ; ^p(T)dT = 1 . We are asked to compute the probability density of T max , 

—00 

Pmzx (^max ) • This probability can be estimated using computer experiments as follows: 
Generate N random numbers sampled from the p{T) probability density. From the N 
random numbers we select the maximum, T max (l) . This experiment is repeated; say L 

times, to produce L maximal numbers |r max (j)} L • The resulting numbers are 

histogrammed to estimate their probability density, p(T max ) . 

Analytically the following procedure is used to estimate p(T max ) from p(T) . We 
consider the distribution functions g max (T) and Q(T) that are defined as follows 

-OO 

Q{T)=)p{T)dT 

-co 

The relationships between the Qs and the p s are obvious. However, we only know 
p(T) and therefore also Q(T) . Q max (T) is the probability of sampling a T max that is 
smaller than T . The probability that the maximum value of the set, T max , is less than a 
particular value, T , is the same as the probability that each member of the set - T i is also 
smaller than T , that is 
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QV m **T) = Y[[Q(T,ZT)'\ 



In terms of the probability density, Pan (7) Q(T max < T) = J p max (T max )dT n 



, we can 



write 



V ^max / 



If all the Q(Tj < r) are the same and are now denoted Q(T) , the expression is 
simplified to 

P^{T) = np{T)Q{T) n 



To demonstrate that the new probability density for the maximum number, p max (T) , is 
quite different from p{T) let us consider a simple example. Let p(T) be a constant, 
1/1, between 0 and L (a uniform probability density). Then Q(T) is 

Q{T) = )p(T')dT' = (/ L ).T 
and Pmax (70 is 

Note also that in contrast to our prior guess of a Gaussian distribution, the corresponding 
Pmax {T) ^ p{T) is a Gaussian is not yet another Gaussian. 

The probability density that we are really interested in (in the context of BLAST) is the 
geometric distribution. Here we only state the results of a (conceptually) similar analysis 
to what we did. Let T max be the maximum of n random numbers distributed according to 
the geometric distribution. The probability of obtaining a random number T by chance 
that is larger than T max is bound by the so-called extreme value distribution 

exp[-Aiexp(-Ar)] < P(T max <T)< exp[-ziexp(-A (T + 1))] 

This asymptotic formula is essentially the same formula we wrote BLAST 
P(T>T T ) = \-e 



■y-y 



where y = K m n-exp[-AT T ] 

Sometimes we also encounter the E -value defined as, E = - log (l - p) 
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This probability is also called the P-value. The smaller is the P-value 5 the more 
significant is the score (i.e. it is less likely to be obtained by chance). The parameter K is 
determined (approximately) by K = (c/Z)exp(-A) , where C is the coefficient of the 
geometric distribution analogous to the estimate we made 

earlier P(r max >T) = C exp(- log (q/p) T) and Z is the typical length of the alignment 
between sequential maxima. We did not discuss the calculation of L . 

This concludes our brief description of the BLAST algorithm. To summarize, BLAST 
uses statistical theory to estimate what is the optimal score of two random sequences 
depending on the length of the alignment and the properties of the substitution matrix. 
Whatever score we obtain when aligning two real sequences it needs to be much higher 
than the score of random sequences in order to be significant. BLAST provides a 
theoretical estimate what are the chances that a random sequence will score the same. 
The searching for significant matching segments can be done efficiently using hash tables 
and related computer science techniques. 

Use of multiple sequence alignments in signal enhancement 

So far we discussed only alignments of pairs of sequences. There is significant 
information in multiple alignments of related proteins. For example, if a residue is 
conserved over a range of related sequence it is more likely to be important for the well 
being of that protein. On the other hand if a specific position in the sequence has amino 
acids all over the map (e.g. in ten sequences different amino acid is observed at that 
position) then this site along the sequence is expected to be less significant and should 
contribute little to the overall score of the alignment. It is suggestive to define an average 
score (t) such that 

(T) = I,p(Ua)s{a,fi) 

i,a 

where p(i,a) is the probability of finding amino acid a at the / column of the multiple 
sequence alignment, and /? is the amino acid from the database sequence we compare 
the multiple sequence alignment to. Alternative way of computing the score is 




which is putting emphasis on the probability of observing amino acid ft at a specific 
column. 

We hope to estimate the above probability directly from the multiple sequence alignment. 
A multiple sequence alignment will look something like 
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We do not discuss here how the multiple sequence alignment was obtained and assume 
that it is given. Multiple sequence alignment is actually a hard problem that we shall 
discuss later. Typically we are able to find and align an order of 10 different sequences. It 
is clear that we will suffer from a severe sampling problem. Since there are twenty amino 
acids it is impossible that the limited sampling described above will yield frequencies that 
are good approximation top the true probabilities. For example, it is possible that amino 
acid F will not appear at all at a given column. This will immediately creates a 
downgrading for a novel sequence that includes F in that position. 

It is possible to overcome (partially) the under sampling problem by using the "null" 
hypothesis. Adding more statistics from a known distribution of amino acids not 
necessarily related to the sequences at hand. We compute the probability p(i,a) as 



p(i,a) = 



N s +B t fL N, + A B ; 



where 5, is the total number of pseudo-counts at column / and b ia is total number of 
pseudo count at column / of amino acid a . Similarly N, (and n ia ) are the actual 
numbers of sequences (and amino acid type a ) we have in the multiple sequence 
alignment at column / . The pseudo counting parameters are unknown. 

The most straightforward approach of estimating b ia is b ia = p(a)'B r Hence, we are 
getting p(a) from the known (general) distribution of amino acids. This choice is not 

optimal since it is not using at all the information we have so far. It is true, the 
information is limited, but it is still more than zero. A possible way of generating pseudo 
counts is 

b ta =B£p(a 9 fi)p(i 9 0) 

The probability, p(i,/3) , is computed directly from the multiple sequence alignment. It is 

the raw frequencies extracted from the limited number of sequences that we have. The 
raw frequencies are multiplied by a conditional probability of pairs of amino acids, 
p(a,f3), which is extracted (for example) from the BLOSUM matrix. The substitution 

probabilities of the BLOSUM matrix are not zero, and therefore the probability of 
observing any amino acid will not be zero. Though, in principle, some of the amino acids 
will be highly unlikely. For example if we have only W at column / , the probability of 
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observing R will be particularly small. The proposed way of generating pseudo-counts is 
attractive since it is using the known statistics of amino acids to generate additional 
counts that we were likely to miss. 

How should we handle the total number of pseudo counts, B i ? We have a number of 
expectations. For example, if we have a LOT of data then B i should decreases. Also if 
there is a great diversity of amino acids at a given column then pseudo counts are more 
important. However, if we get ten times exactly the same amino acid, it is less likely that 
we miss something and that we need to generate a lot of pseudo-counters. 

A common sense choice is B t =N -V n where N is an empirical constant and V i is the 
measure of amino acid diversity at position /. 
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